Probability Score Class Docs

There is just one class to represent predictions returning probability scores

  • BinaryScore: an object that represents predictions matching each observation with a probability score between 0 and 1.

Probability scores can be easily transformed in predictions by setting a threshold above which the observations are mapped to “1”, while the remaining get a “0”.

BinaryScore

Class to represent probability estimates, thus predictions that do not directly return fitted values but that can be converted to such. It can be viewed as the step before BinaryPrediction.

It allows to compute AUC score and other metrics that depend on the convertion threshold as arrays.

class easypred.binary_score.BinaryScore

Bases: object

Class to represent a prediction in terms of probability estimates, thus having each observation paired with a score between 0 and 1 representing the likelihood of being the “positive value”.

computation_decimals

The number of decimal places to be considered when rounding probability scores to obtain the unique values.

Type

int

fitted_scores

The array-like object of length N containing the probability scores.

Type

np.ndarray | pd.Series

real_values

The array-like object containing the N real values.

Type

np.ndarray | pd.Series

value_positive

The value in the data that corresponds to 1 in the boolean logic. It is generally associated with the idea of “positive” or being in the “treatment” group. By default is 1.

Type

Any

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.18],
...                     value_positive=1)
>>> score.real_values
array([0, 1, 1, 0, 1, 0])
>>> score.fitted_scores
array([0.31, 0.44, 0.24, 0.28, 0.37, 0.18])
>>> score.value_positive
1
>>> score.computation_decimals
3
__init__(real_values, fitted_scores, value_positive=1)

Create a BinaryScore object to represent a prediction in terms of probability estimates.

Parameters
  • real_values (np.ndarray | pd.Series | list | tuple) – The array-like object containing the real values. If not pd.Series or np.array, it will be coerced into np.array.

  • fitted_scores (np.ndarray | pd.Series | list | tuple) – The array-like object of length N containing the probability scores. It must have the same length as real_values. If not pd.Series or np.array, it will be coerced into np.array.

  • value_positive (Any) – The value in the data that corresponds to 1 in the boolean logic. It is generally associated with the idea of “positive” or being in the “treatment” group. By default is 1.

Examples

>>> from easypred import BinaryScore
>>> BinaryScore([0, 1, 1, 0, 1, 0], [0.31, 0.44, 0.24, 0.28, 0.37, 0.18])
<easypred.binary_score.BinaryScore object at 0x000001E8AD923430>
property accuracy_scores: numpy.ndarray

Return an array containing the accuracy scores calculated setting the threshold for each unique score value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.accuracy_scores
array([0.5       , 0.66666667, 0.5       , 0.66666667, 0.83333333,
       0.66666667])

Note that the length of the array changes if the number of decimals used in the computation of unique values is lowered to 2. This is because 0.241 and 0.244 establish a unique threshold equal to 0.24.

>>> score.computation_decimals = 2
>>> score.accuracy_scores
array([0.5       , 0.5       , 0.66666667, 0.83333333, 0.66666667])
property auc_score: float

Return the Area Under the Receiver Operating Characteristic Curve (ROC AUC).

It is computed using pairs properties as: (Nc - 0.5 * Nt) / Ntot. Where Nc is the number of concordant pairs, Ntot is the number of tied pairs and Ntot is the total number of pairs.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.24],
...                     value_positive=1)
>>> score.auc_score
0.7222222222222222
best_threshold(criterion='f1')

Return the threshold to convert scores into values that performs the best given a specified criterion.

Parameters

criterion (str, optional) – The value to be maximized by the threshold. It defaults to “f1”, the options are: - “f1”: maximize the f1 score - “accuracy”: maximize the accuracy score

Returns

The threshold that maximizes the indicator specified.

Return type

float

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.best_threshold(criterion="f1")
0.37
property c_score: float

Return the Kendall tau-a, computed as the difference between the number of concordant and discordant pairs, divided by the number of combinations of pairs.

Returns

Kendall tau-a.

Return type

float

References

https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient#Tau-a

describe()

Return a dataframe containing some key information about the prediction.

Examples

>>> real = [0, 1, 1, 0, 1, 0]
>>> fit = [0.31, 0.44, 0.24, 0.28, 0.37, 0.18]
>>> from easypred import BinaryScore
>>> score = BinaryScore(real, fit, value_positive=1)
>>> score.describe()
                        Value
N                    6.000000
Max fitted score     0.440000
AUC score            0.777778
Max accuracy         0.833333
Thresh max accuracy  0.370000
Max F1 score         0.800000
Thresh max F1 score  0.370000
Return type

pandas.core.frame.DataFrame

property f1_scores: numpy.ndarray

Return an array containing the f1 scores calculated setting the threshold for each unique score value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.f1_scores
array([0.66666667, 0.75      , 0.57142857, 0.66666667, 0.8       ,
       0.5       ])

Note that the length of the array changes if the number of decimals used in the computation of unique values is lowered to 2. This is because 0.241 and 0.244 establish a unique threshold equal to 0.24.

>>> score.computation_decimals = 2
>>> score.f1_scores
array([0.66666667, 0.57142857, 0.66666667, 0.8       , 0.5       ])
property false_positive_rates: numpy.ndarray

Return an array containing the false positive rates calculated setting the threshold for each unique score value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.false_positive_rates
array([1.        , 0.66666667, 0.66666667, 0.33333333, 0.        ,
       0.        ])

Note that the length of the array changes if the number of decimals used in the computation of unique values is lowered to 2. This is because 0.241 and 0.244 establish a unique threshold equal to 0.24.

>>> score.computation_decimals = 2
>>> score.false_positive_rates
array([1.        , 0.66666667, 0.33333333, 0.        , 0.        ])
property goodmankruskagamma_score: float

Return the Goodman and Kruskal’s gamma, computed as the ratio between the difference and the sum of the number of concordant and discordant pairs.

Returns

Goodman and Kruskal’s gamma.

Return type

float

References

https://en.wikipedia.org/wiki/Goodman_and_Kruskal%27s_gamma

property kendalltau_score: float

Return the Kendall tau-a, computed as the difference between the number of concordant and discordant pairs, divided by the number of combinations of pairs.

Returns

Kendall tau-a.

Return type

float

References

https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient#Tau-a

pairs_count(relative=False)

Return a dataframe containing the count of concordant, discordant, tied and total pairs.

Parameters

relative (bool, optional) – If True, return the relative percentage for the three types of pairs instead that the absolute count. By default is False.

Returns

A dataframe of shape (3, 1) containing in one column the information about concordant, discordant and tied pairs.

Return type

pd.DataFrame

Examples

>>> real = [1, 0, 0, 1, 0]
>>> fit = [0.81, 0.31, 0.81, 0.73, 0.45]
>>> from easypred import BinaryScore
>>> score = BinaryScore(real, fit, value_positive=1)
>>> score.pairs_count()
            Count
Concordant      4
Discordant      1
Tied            1
Total           6
>>> score.pairs_count(relative=True)
            Percentage
Concordant    0.666667
Discordant    0.166667
Tied          0.166667
Total              1.0
plot_metric(metric, figsize=(20, 10), show_legend=True, title_size=14, axes_labels_size=12, ax=None, **kwargs)

Plot the variation for one or more metrics given different values for the threshold telling “1s” from “0s”.

Parameters
  • metric (Metric function | list[Metric functions]) – A function from easypred.metrics or a list of such functions. It defines which values are to be plotted.

  • figsize (tuple[int, int], optional) – Tuple of integers specifying the size of the plot. Default is (20, 10).

  • show_legend (bool, optional) – If True, show the plot’s legend. By default is True.

  • title_size (int, optional) – Font size of the plot title. Default is 14.

  • axes_labels_size (int, optional) – Font size of the axes labels. Default is 12.

  • ax (matplotlib Axes, optional) – Axes object to draw the plot onto, otherwise creates new Figure and Axes. Use this option to further customize the plot.

  • kwargs (key, value mappings) – Other keyword arguments to be passed through to matplotlib.pyplot.hist().

Returns

Matplotlib Axes object with the plot drawn on it.

Return type

matplotlib Axes

Examples

With one metric

>>> real = [0, 1, 1, 0, 1, 0]
>>> fit = [0.31, 0.44, 0.73, 0.28, 0.37, 0.18]
>>> from easypred import BinaryScore
>>> score = BinaryScore(real, fit, value_positive=1)
>>> from easypred.metrics import accuracy_score
>>> score.plot_metric(metric=accuracy_score)
<AxesSubplot:title={'center':'accuracy_score given different thresholds'},
xlabel='Threshold', ylabel='Metric value'>
>>> from matplotlib import pyplot as plt
>>> plt.show()

Adding a second metric

>>> from easypred.metrics import f1_score
>>> score.plot_metrics(metric=[accuracy_score, f1_score])
<AxesSubplot:title={'center':'accuracy_score & f1_score given different thresholds'},
xlabel='Threshold', ylabel='Metric value'>
>>> plt.show()
plot_roc_curve(figsize=(20, 10), plot_baseline=True, show_legend=True, title_size=14, axes_labels_size=12, ax=None, **kwargs)

Plot the ROC curve for the score. This curve depicts the True Positive Rate (Recall score) against the False Positive Rate.

Parameters
  • figsize (tuple[int, int], optional) – Tuple of integers specifying the size of the plot. Default is (20, 10).

  • plot_baseline (bool, optional) – If True, a reference straight line with slope 1 is added to the plot, representing the performance of a random classifier. By default is True.

  • title_size (int, optional) – Font size of the plot title. Default is 14.

  • axes_labels_size (int, optional) – Font size of the axes labels. Default is 12.

  • ax (matplotlib Axes, optional) – Axes object to draw the plot onto, otherwise creates new Figure and Axes. Use this option to further customize the plot.

  • kwargs (key, value mappings) – Other keyword arguments to be passed through to matplotlib.pyplot.plot().

  • show_legend (bool) –

Returns

Matplotlib Axes object with the plot drawn on it.

Return type

matplotlib Axes

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.plot_roc_curve()
<AxesSubplot:title={'center':'ROC Curve'},
xlabel='False Positive Rate', ylabel='True Positive Rate'>
>>> from matplotlib import pyplot as plt
>>> plt.show()
plot_score_histogram(figsize=(20, 10), title_size=14, axes_labels_size=12, ax=None, **kwargs)

Plot the histogram of the probability scores.

Parameters
  • figsize (tuple[int, int], optional) – Tuple of integers specifying the size of the plot. Default is (20, 10).

  • title_size (int, optional) – Font size of the plot title. Default is 14.

  • axes_labels_size (int, optional) – Font size of the axes labels. Default is 12.

  • ax (matplotlib Axes, optional) – Axes object to draw the plot onto, otherwise creates new Figure and Axes. Use this option to further customize the plot.

  • kwargs (key, value mappings) – Other keyword arguments to be passed through to matplotlib.pyplot.hist().

Returns

Matplotlib Axes object with the plot drawn on it.

Return type

matplotlib Axes

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.plot_score_histogram()
<AxesSubplot:title={'center':'Fitted Scores Distribution'},
xlabel='Fitted Scores', ylabel='Frequency'>
>>> from matplotlib import pyplot as plt
>>> plt.show()

Passing keyword arguments to matplotlib’s hist function:

>>> score.plot_score_histogram(bins=10)
<AxesSubplot:title={'center':'Fitted Scores Distribution'},
xlabel='Fitted Scores', ylabel='Frequency'>
property recall_scores: numpy.ndarray

Return an array containing the recall scores calculated setting the threshold for each unique score value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.recall_scores
array([1.        , 1.        , 0.66666667, 0.66666667, 0.66666667,
       0.33333333])

Note that the length of the array changes if the number of decimals used in the computation of unique values is lowered to 2. This is because 0.241 and 0.244 establish a unique threshold equal to 0.24.

>>> score.computation_decimals = 2
>>> score.recall_scores
array([1.        , 0.66666667, 0.66666667, 0.66666667, 0.33333333])
score_to_values(threshold=0.5)

Return an array contained fitted values derived on the basis of the provided threshold.

Parameters

threshold (float, optional) – The minimum value such that the score is translated into value_positive. Any score below the threshold is instead associated with the other value. By default 0.5.

Returns

The array containing the inferred fitted values. Its type matches fitted_scores’ type.

Return type

np.ndarray | pd.Series

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.24],
...                     value_positive=1)
>>> score.score_to_values(threshold=0.6)
array([0, 0, 0, 0, 0, 0])
>>> score.score_to_values(threshold=0.31)
array([1, 1, 0, 0, 1, 0])
property somersd_score: float

Return the Somer’s D score, computed as the difference between the number of concordant and discordant pairs, divided by the total number of pairs.

Also called: Gini coefficient.

Returns

Somer’s D score.

Return type

float

References

https://en.wikipedia.org/wiki/Somers%27_D#Somers’_D_for_binary_dependent_variables

to_binary_prediction(threshold=0.5)

Create an instance of BinaryPrediction from the BinaryScore object.

Parameters

threshold (float | str, optional) –

If float, it is the minimum value such that the score is translated into value_positive. Any score below the threshold is instead associated with the other value. If str, the threshold is automatically set such that it maximizes the metric corresponding to the provided keyword. The available keywords are: - “f1”: maximize the f1 score - “accuracy”: maximize the accuracy score

By default 0.5.

Returns

An object of type BinaryPrediction, a subclass of Prediction specific for predictions with just two outcomes. The class instance is given the special attribute “threshold” that returns the threshold used in the convertion.

Return type

BinaryPrediction

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.244, 0.28, 0.37, 0.241],
...                     value_positive=1)
>>> score.to_binary_prediction(threshold=0.37)
<easypred.binary_prediction.BinaryPrediction object at 0x000001E8C813FAF0>
property unique_scores: VectorPdNp

Return the unique values attained by the fitted scores, sorted in ascending order

Returns

The array containing the sorted unique values. Its type matches fitted_scores’ type.

Return type

np.ndarray | pd.Series

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.24],
...                     value_positive=1)
>>> score.unique_scores
array([0.24, 0.28, 0.31, 0.37, 0.44])
property value_negative: Any

Return the value that it is not the positive value.

Examples

>>> from easypred import BinaryScore
>>> score = BinaryScore([0, 1, 1, 0, 1, 0],
...                     [0.31, 0.44, 0.24, 0.28, 0.37, 0.18],
...                     value_positive=1)
>>> score.value_negative
0