Probability Score Class Docs

There is just one class to represent predictions returning probability scores

BinaryScore: an object that represents predictions matching each observation with a probability score between 0 and 1.

Probability scores can be easily transformed in predictions by setting a threshold above which the observations are mapped to “1”, while the remaining get a “0”.

BinaryScore

Class to represent probability estimates, thus predictions that do not directly return fitted values but that can be converted to such. It can be viewed as the step before BinaryPrediction.

It allows to compute AUC score and other metrics that depend on the convertion threshold as arrays.

class easypred.binary_score.BinaryScore

Bases: object

Class to represent a prediction in terms of probability estimates, thus having each observation paired with a score between 0 and 1 representing the likelihood of being the “positive value”.

computation_decimals

The number of decimal places to be considered when rounding probability scores to obtain the unique values.

Type: int

fitted_scores

The array-like object of length N containing the probability scores.

Type: np.ndarray | pd.Series

real_values

The array-like object containing the N real values.

Type: np.ndarray | pd.Series

value_positive

The value in the data that corresponds to 1 in the boolean logic. It is generally associated with the idea of “positive” or being in the “treatment” group. By default is 1.

Type: Any

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.18],
...                     value_positive=1)
>>> score.real_values
array([0, 1, 1, 0, 1, 0])
>>> score.fitted_scores
array([0.31, 0.44, 0.24, 0.28, 0.37, 0.18])
>>> score.value_positive
1
>>> score.computation_decimals
3

__init__(real_values, fitted_scores, value_positive=1)

Create a BinaryScore object to represent a prediction in terms of probability estimates.

Parameters

real_values (np.ndarray | pd.Series | list | tuple) – The array-like object containing the real values. If not pd.Series or np.array, it will be coerced into np.array.
fitted_scores (np.ndarray | pd.Series | list | tuple) – The array-like object of length N containing the probability scores. It must have the same length as real_values. If not pd.Series or np.array, it will be coerced into np.array.
value_positive (Any) – The value in the data that corresponds to 1 in the boolean logic. It is generally associated with the idea of “positive” or being in the “treatment” group. By default is 1.

Examples

>>> from easypred import BinaryScore
>>> BinaryScore([0, 1, 1, 0, 1, 0], [0.31, 0.44, 0.24, 0.28, 0.37, 0.18])
<easypred.binary_score.BinaryScore object at 0x000001E8AD923430>

property accuracy_scores: numpy.ndarray

Return an array containing the accuracy scores calculated setting the threshold for each unique score value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.accuracy_scores
array([0.5       , 0.66666667, 0.5       , 0.66666667, 0.83333333,
       0.66666667])

Note that the length of the array changes if the number of decimals used in the computation of unique values is lowered to 2. This is because 0.241 and 0.244 establish a unique threshold equal to 0.24.

>>> score.computation_decimals = 2
>>> score.accuracy_scores
array([0.5       , 0.5       , 0.66666667, 0.83333333, 0.66666667])

property auc_score: float

Return the Area Under the Receiver Operating Characteristic Curve (ROC AUC).

It is computed using pairs properties as: (Nc - 0.5 * Nt) / Ntot. Where Nc is the number of concordant pairs, Ntot is the number of tied pairs and Ntot is the total number of pairs.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.24],
...                     value_positive=1)
>>> score.auc_score
0.7222222222222222

best_threshold(criterion='f1')

Return the threshold to convert scores into values that performs the best given a specified criterion.

Parameters: criterion (str, optional) – The value to be maximized by the threshold. It defaults to “f1”, the options are: - “f1”: maximize the f1 score - “accuracy”: maximize the accuracy score
Returns: The threshold that maximizes the indicator specified.
Return type: float

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.best_threshold(criterion="f1")
0.37

property c_score: float

Return the Kendall tau-a, computed as the difference between the number of concordant and discordant pairs, divided by the number of combinations of pairs.

Returns: Kendall tau-a.
Return type: float

References

https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient#Tau-a

describe()

Return a dataframe containing some key information about the prediction.

Examples

>>> real = [0, 1, 1, 0, 1, 0]
>>> fit = [0.31, 0.44, 0.24, 0.28, 0.37, 0.18]
>>> from easypred import BinaryScore
>>> score = BinaryScore(real, fit, value_positive=1)
>>> score.describe()
                        Value
N                    6.000000
Max fitted score     0.440000
AUC score            0.777778
Max accuracy         0.833333
Thresh max accuracy  0.370000
Max F1 score         0.800000
Thresh max F1 score  0.370000

Return type: pandas.core.frame.DataFrame

property f1_scores: numpy.ndarray

Return an array containing the f1 scores calculated setting the threshold for each unique score value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.f1_scores
array([0.66666667, 0.75      , 0.57142857, 0.66666667, 0.8       ,
       0.5       ])

Note that the length of the array changes if the number of decimals used in the computation of unique values is lowered to 2. This is because 0.241 and 0.244 establish a unique threshold equal to 0.24.

>>> score.computation_decimals = 2
>>> score.f1_scores
array([0.66666667, 0.57142857, 0.66666667, 0.8       , 0.5       ])

property false_positive_rates: numpy.ndarray

Return an array containing the false positive rates calculated setting the threshold for each unique score value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.false_positive_rates
array([1.        , 0.66666667, 0.66666667, 0.33333333, 0.        ,
       0.        ])

Note that the length of the array changes if the number of decimals used in the computation of unique values is lowered to 2. This is because 0.241 and 0.244 establish a unique threshold equal to 0.24.

>>> score.computation_decimals = 2
>>> score.false_positive_rates
array([1.        , 0.66666667, 0.33333333, 0.        , 0.        ])

property goodmankruskagamma_score: float

Return the Goodman and Kruskal’s gamma, computed as the ratio between the difference and the sum of the number of concordant and discordant pairs.

Returns: Goodman and Kruskal’s gamma.
Return type: float

References

https://en.wikipedia.org/wiki/Goodman_and_Kruskal%27s_gamma

property kendalltau_score: float

Return the Kendall tau-a, computed as the difference between the number of concordant and discordant pairs, divided by the number of combinations of pairs.

Returns: Kendall tau-a.
Return type: float

References

https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient#Tau-a

pairs_count(relative=False)

Return a dataframe containing the count of concordant, discordant, tied and total pairs.

Parameters: relative (bool, optional) – If True, return the relative percentage for the three types of pairs instead that the absolute count. By default is False.
Returns: A dataframe of shape (3, 1) containing in one column the information about concordant, discordant and tied pairs.
Return type: pd.DataFrame

Examples

>>> real = [1, 0, 0, 1, 0]
>>> fit = [0.81, 0.31, 0.81, 0.73, 0.45]
>>> from easypred import BinaryScore
>>> score = BinaryScore(real, fit, value_positive=1)
>>> score.pairs_count()
            Count
Concordant      4
Discordant      1
Tied            1
Total           6
>>> score.pairs_count(relative=True)
            Percentage
Concordant    0.666667
Discordant    0.166667
Tied          0.166667
Total              1.0

plot_metric(metric, figsize=(20, 10), show_legend=True, title_size=14, axes_labels_size=12, ax=None, **kwargs)

Plot the variation for one or more metrics given different values for the threshold telling “1s” from “0s”.

Parameters

metric (Metric function | list[Metric functions]) – A function from easypred.metrics or a list of such functions. It defines which values are to be plotted.
figsize (tuple[int, int], optional) – Tuple of integers specifying the size of the plot. Default is (20, 10).
show_legend (bool, optional) – If True, show the plot’s legend. By default is True.
title_size (int, optional) – Font size of the plot title. Default is 14.
axes_labels_size (int, optional) – Font size of the axes labels. Default is 12.
ax (matplotlib Axes, optional) – Axes object to draw the plot onto, otherwise creates new Figure and Axes. Use this option to further customize the plot.
kwargs (key, value mappings) – Other keyword arguments to be passed through to matplotlib.pyplot.hist().

Returns

Matplotlib Axes object with the plot drawn on it.

Return type

matplotlib Axes

Examples

With one metric

>>> real = [0, 1, 1, 0, 1, 0]
>>> fit = [0.31, 0.44, 0.73, 0.28, 0.37, 0.18]
>>> from easypred import BinaryScore
>>> score = BinaryScore(real, fit, value_positive=1)
>>> from easypred.metrics import accuracy_score
>>> score.plot_metric(metric=accuracy_score)
<AxesSubplot:title={'center':'accuracy_score given different thresholds'},
xlabel='Threshold', ylabel='Metric value'>
>>> from matplotlib import pyplot as plt
>>> plt.show()

Adding a second metric

>>> from easypred.metrics import f1_score
>>> score.plot_metrics(metric=[accuracy_score, f1_score])
<AxesSubplot:title={'center':'accuracy_score & f1_score given different thresholds'},
xlabel='Threshold', ylabel='Metric value'>
>>> plt.show()

plot_roc_curve(figsize=(20, 10), plot_baseline=True, show_legend=True, title_size=14, axes_labels_size=12, ax=None, **kwargs)

Plot the ROC curve for the score. This curve depicts the True Positive Rate (Recall score) against the False Positive Rate.

Parameters

figsize (tuple[int, int], optional) – Tuple of integers specifying the size of the plot. Default is (20, 10).
plot_baseline (bool, optional) – If True, a reference straight line with slope 1 is added to the plot, representing the performance of a random classifier. By default is True.
title_size (int, optional) – Font size of the plot title. Default is 14.
axes_labels_size (int, optional) – Font size of the axes labels. Default is 12.
ax (matplotlib Axes, optional) – Axes object to draw the plot onto, otherwise creates new Figure and Axes. Use this option to further customize the plot.
kwargs (key, value mappings) – Other keyword arguments to be passed through to matplotlib.pyplot.plot().
show_legend (bool) –

Returns

Matplotlib Axes object with the plot drawn on it.

Return type

matplotlib Axes

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.plot_roc_curve()
<AxesSubplot:title={'center':'ROC Curve'},
xlabel='False Positive Rate', ylabel='True Positive Rate'>
>>> from matplotlib import pyplot as plt
>>> plt.show()

plot_score_histogram(figsize=(20, 10), title_size=14, axes_labels_size=12, ax=None, **kwargs)

Plot the histogram of the probability scores.

Parameters

figsize (tuple[int, int], optional) – Tuple of integers specifying the size of the plot. Default is (20, 10).
title_size (int, optional) – Font size of the plot title. Default is 14.
axes_labels_size (int, optional) – Font size of the axes labels. Default is 12.
ax (matplotlib Axes, optional) – Axes object to draw the plot onto, otherwise creates new Figure and Axes. Use this option to further customize the plot.
kwargs (key, value mappings) – Other keyword arguments to be passed through to matplotlib.pyplot.hist().

Returns

Matplotlib Axes object with the plot drawn on it.

Return type

matplotlib Axes

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.plot_score_histogram()
<AxesSubplot:title={'center':'Fitted Scores Distribution'},
xlabel='Fitted Scores', ylabel='Frequency'>
>>> from matplotlib import pyplot as plt
>>> plt.show()

Passing keyword arguments to matplotlib’s hist function:

>>> score.plot_score_histogram(bins=10)
<AxesSubplot:title={'center':'Fitted Scores Distribution'},
xlabel='Fitted Scores', ylabel='Frequency'>

property recall_scores: numpy.ndarray

Return an array containing the recall scores calculated setting the threshold for each unique score value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.recall_scores
array([1.        , 1.        , 0.66666667, 0.66666667, 0.66666667,
       0.33333333])

Note that the length of the array changes if the number of decimals used in the computation of unique values is lowered to 2. This is because 0.241 and 0.244 establish a unique threshold equal to 0.24.

>>> score.computation_decimals = 2
>>> score.recall_scores
array([1.        , 0.66666667, 0.66666667, 0.66666667, 0.33333333])

score_to_values(threshold=0.5)

Return an array contained fitted values derived on the basis of the provided threshold.

Parameters: threshold (float, optional) – The minimum value such that the score is translated into value_positive. Any score below the threshold is instead associated with the other value. By default 0.5.
Returns: The array containing the inferred fitted values. Its type matches fitted_scores’ type.
Return type: np.ndarray | pd.Series

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.24],
...                     value_positive=1)
>>> score.score_to_values(threshold=0.6)
array([0, 0, 0, 0, 0, 0])
>>> score.score_to_values(threshold=0.31)
array([1, 1, 0, 0, 1, 0])

property somersd_score: float

Return the Somer’s D score, computed as the difference between the number of concordant and discordant pairs, divided by the total number of pairs.

Also called: Gini coefficient.

Returns: Somer’s D score.
Return type: float

References

https://en.wikipedia.org/wiki/Somers%27_D#Somers’_D_for_binary_dependent_variables

to_binary_prediction(threshold=0.5)

Create an instance of BinaryPrediction from the BinaryScore object.

Parameters

threshold (float | str, optional) –

If float, it is the minimum value such that the score is translated into value_positive. Any score below the threshold is instead associated with the other value. If str, the threshold is automatically set such that it maximizes the metric corresponding to the provided keyword. The available keywords are: - “f1”: maximize the f1 score - “accuracy”: maximize the accuracy score

By default 0.5.

Returns

An object of type BinaryPrediction, a subclass of Prediction specific for predictions with just two outcomes. The class instance is given the special attribute “threshold” that returns the threshold used in the convertion.

Return type

BinaryPrediction

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.to_binary_prediction(threshold=0.37)
<easypred.binary_prediction.BinaryPrediction object at 0x000001E8C813FAF0>

property unique_scores: VectorPdNp

Return the unique values attained by the fitted scores, sorted in ascending order

Returns: The array containing the sorted unique values. Its type matches fitted_scores’ type.
Return type: np.ndarray | pd.Series

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.24],
...                     value_positive=1)
>>> score.unique_scores
array([0.24, 0.28, 0.31, 0.37, 0.44])

property value_negative: Any

Return the value that it is not the positive value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.18],
...                     value_positive=1)
>>> score.value_negative
0