API reference¶
fairness.data¶
fairness.data¶
Data loading utilities used by the fairness toolkit.
This module loads tabular datasets into a pandas DataFrame, while preserving row order and/or indices so that downstream steps can guarantee alignment between:
- model predictions (y_pred)
- true labels (y_test)
- protected attributes used to construct intersectional groups
Dataset-specific logic (e.g., mapping target labels, binning ages, cleaning special missing-value encodings such as '?') should live in small adapter functions.
Typical usage
from fairness.data import load_csv, load_features_and_target df = load_csv("data/heart.csv") X, y = load_features_and_target(df, target_col="HeartDisease")
load_csv
¶
Load a CSV file into a pandas DataFrame.
The CSV may be provided either as a local file path or as a URL (e.g. an HTTP(S) link to a raw CSV file).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
PathLike
|
Path or URL to the CSV file. |
required |
index_col
|
Optional[Union[int, str]]
|
Column to use as the row index (passed to pandas.read_csv). If None, pandas uses a default integer index. |
None
|
na_values
|
Optional[Union[str, Sequence[str]]]
|
Additional strings to recognise as NA/NaN. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
The dataset as a DataFrame. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If a local file path does not exist. |
ValueError
|
If the loaded CSV is empty. |
load_features_and_target
¶
Split a DataFrame into features X and target y.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Full dataset containing both features and target. |
required |
target_col
|
str
|
Name of the target column. |
required |
drop_cols
|
Sequence[str]
|
Additional columns to drop from X (e.g., derived protected attributes used only for fairness analysis such as 'age_group'). |
()
|
Returns:
| Type | Description |
|---|---|
(X, y):
|
X is a DataFrame of features, y is a Series of labels. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If target_col is not in df, or if resulting X is empty. |
load_heart_csv
¶
Load the Heart Disease CSV used in the tutorial.
This is a wrapper around load_csv()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
PathLike
|
Path to heart.csv. |
required |
target_col
|
str
|
Expected target column name (used for validation). |
'HeartDisease'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Loaded dataset. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the expected target column is missing. |
validate_columns
¶
Validate that required columns exist in the DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame. |
required |
required
|
Iterable[str]
|
Column names that must be present. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any required column is missing. |
fairness.preprocess¶
fairness.preprocess¶
Preprocessing utilities for tabular datasets used in fairness analysis.
This module includes: - feature engineering (e.g., binning age into age_group) - converting raw tabular data into numeric features suitable for ML - producing reproducible train/test splits while preserving indices
Design notes
- The toolkit is model-agnostic: these functions do not require sklearn pipelines, but they produce outputs compatible with sklearn and similar libraries.
- Protected attributes may be used for fairness analysis even if they are excluded from model training. Derived protected attributes (e.g. age_group) are excluded from model inputs.
Typical usage
from fairness.data import load_csv from fairness.preprocess import add_age_group, preprocess_tabular, make_train_test_split df = load_csv("data/heart.csv") df = add_age_group(df) df_model = preprocess_tabular(df) split = make_train_test_split(df_model, target_col="HeartDisease", drop_cols=("age_group",))
SplitData
dataclass
¶
Container for a reproducible train/test split.
Attributes:
| Name | Type | Description |
|---|---|---|
X_train, X_test |
Feature matrices for training and testing. |
|
y_train, y_test |
Target vectors for training and testing. |
add_age_group
¶
Add a categorical age-group column derived from a continuous age column.
This is useful for fairness analysis because continuous protected attributes (like age) create too many groups; binning yields interpretable groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input dataset. |
required |
age_col
|
str
|
Name of the column containing numeric ages. |
'Age'
|
new_col
|
str
|
Name of the derived categorical column to create. |
'age_group'
|
bins
|
Sequence[float]
|
Bin edges passed to pandas.cut. |
(0, 55, 120)
|
labels
|
Sequence[str]
|
Labels assigned to the bins. |
('young', 'older')
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of df with the new categorical column added. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If age_col is missing or binning produces missing values. |
apply_transforms
¶
Apply a sequence of DataFrame -> DataFrame transforms in order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input dataset. |
required |
transforms
|
Sequence[Callable[[DataFrame], DataFrame]]
|
Sequence of callables each returning a modified DataFrame. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Transformed DataFrame. |
make_train_test_split
¶
Create a reproducible train/test split for modelling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Preprocessed dataset containing features and target. |
required |
target_col
|
str
|
Name of the target column. |
required |
drop_cols
|
Sequence[str]
|
Additional columns to exclude from X (e.g. derived protected attributes). |
()
|
test_size
|
float
|
Fraction of rows assigned to the test set. |
0.3
|
random_state
|
int
|
Random seed for reproducibility. |
42
|
stratify
|
bool
|
If True, stratify split by the target to preserve class balance. |
True
|
Returns:
| Type | Description |
|---|---|
SplitData
|
Container holding X_train, X_test, y_train, y_test. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If target_col is missing or df is empty. |
map_binary_column
¶
Map values of a binary/categorical column to new values (e.g., 'M'/'F' -> 1/0).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input dataset. |
required |
col
|
str
|
Column name to map. |
required |
mapping
|
Mapping[object, object]
|
Dictionary defining how to map values. |
required |
strict
|
bool
|
If True, raise if unmapped values occur. If False, leave unmapped as-is. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of df with mapped column. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If strict=True and unmapped values are found. |
preprocess_tabular
¶
Convert a tabular DataFrame into numeric ML-ready features.
Performs one-hot encoding for categorical columns (object/category) and leaves numeric columns unchanged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input dataset. |
required |
drop_cols
|
Sequence[str]
|
Columns to drop prior to encoding |
()
|
one_hot
|
bool
|
Whether to one-hot encode categorical columns. |
True
|
drop_first
|
bool
|
If one_hot=True, drop the first level for each categorical variable to avoid perfect multicollinearity in logistic regression models. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A numeric DataFrame compatible with scikit-learn. |
fairness.groups¶
fairness.groups¶
Minimal utilities for constructing intersectional group labels and producing an evaluation DataFrame aligned with model predictions and true labels.
Primary output: a tidy DataFrame with columns: - subject_label (intersectional group label per individual) - y_pred (model prediction) - y_true (true label)
make_eval_df
¶
Build an evaluation DataFrame for group-based metric functions.
The handoff format for metrics such as accuracy_diff:
subject_labels = eval_df[label_col].tolist()
predictions = eval_df["y_pred"].tolist()
true_statuses = eval_df["y_true"].tolist()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_test
|
DataFrame
|
Test-set DataFrame in the SAME row order as y_pred and y_true (typically df.loc[split.X_test.index]). |
required |
protected
|
Sequence[str]
|
Protected columns used to define intersectional groups. |
required |
y_pred
|
Sequence
|
Model predictions aligned to df_test rows. |
required |
y_true
|
Sequence
|
True labels aligned to df_test rows. |
required |
label_col
|
str
|
Name of the intersectional label column. |
'subject_label'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: subject_label, y_pred, y_true (index preserved). |
make_intersectional_labels
¶
Create an intersectional group label for each row of df.
Example: Sex=1|age_group=older
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame containing protected columns. |
required |
protected
|
Sequence[str]
|
Column names to intersect (order defines label format). |
required |
sep
|
str
|
Formatting separators for the label. |
'|'
|
kv_sep
|
str
|
Formatting separators for the label. |
'|'
|
missing
|
str
|
Placeholder for missing values. |
'NA'
|
Returns:
| Type | Description |
|---|---|
list[str]
|
One label per row, aligned with df. |
fairness.adapters¶
make_subject_labels_dict
¶
Build the dict-of-lists format expected by intersect_* functions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_test
|
DataFrame
|
Test-set DataFrame containing the protected columns. |
required |
protected_cols
|
list[str]
|
E.g. ["Sex", "age_group"] |
required |
Returns:
| Type | Description |
|---|---|
dict[str, list]
|
{col: list_of_values_aligned_rowwise_with_eval_df} |
unpack_eval_df
¶
Convert eval_df into the list inputs expected by group_* metric functions.
Expects eval_df columns: - subject_label (str) - y_pred (0/1) - y_true (0/1)
Returns:
| Name | Type | Description |
|---|---|---|
subject_labels |
list[str]
|
|
predictions |
list[int]
|
|
true_statuses |
list[int]
|
|
fairness.metrics¶
all_intersect_accs
¶
Calculate accuracies for all possible intersectional groups.
Computes accuracy for every combination of categories in the dataset (e.g., all age-group-gender combinations).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary mapping intersectional group names (formatted as "label1 + label2 + ...") to their respective accuracies. |
all_intersect_fdrs
¶
Calculate false discovery rates for all possible intersectional groups.
Computes false discovery rate for every combination of categories in the dataset (e.g., all age-group-gender combinations).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary mapping intersectional group names (as strings with ' + ' separating categories) to their false discovery rates. |
all_intersect_fnrs
¶
Calculate false negative rates for all possible intersectional groups.
Computes false negative rate for every combination of categories in the dataset (e.g., all age-group-gender combinations).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary mapping intersectional group names (as strings with ' + ' separating categories) to their false negative rates. |
all_intersect_fors
¶
Calculate false omission rates for all possible intersectional groups.
Computes false omission rate for every combination of categories in the dataset (e.g., all age-group-gender combinations).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary mapping intersectional group names (as strings with ' + ' separating categories) to their false omission rates. |
all_intersect_fprs
¶
Calculate false positive rates for all possible intersectional groups.
Computes false positive rate for every combination of categories in the dataset (e.g., all age-group-gender combinations).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary mapping intersectional group names (as strings with ' + ' separating categories) to their false positive rates. |
group_acc
¶
Find the accuracy of a group with a specific label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_label
|
str or int
|
The label of the group for which the accuracy of the model should be evaluated. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The accuracy of the model in the specified group. Returns np.nan if the group has no observations. |
group_acc_diff
¶
Calculate the absolute difference in accuracy between two groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The absolute difference in accuracy between the two groups. Returns np.nan if either group has no observations. |
group_acc_ratio
¶
Calculate the ratio of accuracies between two groups.
Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of accuracies between the two groups. Returns np.nan if either group has no observations or if either accuracy is 0. |
group_fdr
¶
Find the false discovery rate of a group with a specific label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_label
|
str or int
|
The label of the group for which the false discovery rate of the model should be evaluated. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The false discovery rate of the model in the specified group. Returns np.nan if the group has no observations. |
group_fdr_diff
¶
Calculate the absolute difference in false discovery rate between two groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The absolute difference in false discovery rate between the two groups. Returns np.nan if either group has no observations. |
group_fdr_ratio
¶
Calculate the ratio of false discovery rates between two groups.
Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of false discovery rates between the two groups. Returns np.nan if either group has no observations or if either false discovery rate is 0. |
group_fnr
¶
Find the false negative rate of a group with a specific label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_label
|
str or int
|
The label of the group for which the false negative rate of the model should be evaluated. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The false negative rate of the model in the specified group. Returns np.nan if the group has no observations. |
group_fnr_diff
¶
Calculate the absolute difference in false negative rate between two groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The absolute difference in false negative rate between the two groups. Returns np.nan if either group has no observations. |
group_fnr_ratio
¶
Calculate the ratio of false negative rates between two groups.
Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of false negative rates between the two groups. Returns np.nan if either group has no observations or if either false negative rate is 0. |
group_for
¶
Find the false omission rate of a group with a specific label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_label
|
str or int
|
The label of the group for which the false omission rate of the model should be evaluated. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The false omission rate of the model in the specified group. Returns np.nan if the group has no observations. |
group_for_diff
¶
Calculate the absolute difference in false omission rate between two groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The absolute difference in false omission rate between the two groups. Returns np.nan if either group has no observations. |
group_for_ratio
¶
Calculate the ratio of false omission rates between two groups.
Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of false omission rates between the two groups. Returns np.nan if either group has no observations or if either false omission rate is 0. |
group_fpr
¶
Find the false positive rate of a group with a specific label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_label
|
str or int
|
The label of the group for which the false positive rate of the model should be evaluated. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The false positive rate of the model in the specified group. Returns np.nan if the group has no observations. |
group_fpr_diff
¶
Calculate the absolute difference in false positive rate between two groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The absolute difference in false positive rate between the two groups. Returns np.nan if either group has no observations. |
group_fpr_ratio
¶
Calculate the ratio of false positive rates between two groups.
Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a_label
|
str or int
|
The label of the first group. |
required |
group_b_label
|
str or int
|
The label of the second group. |
required |
subject_labels
|
dict
|
A dictionary containing subject labels for every observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of false positive rates between the two groups. Returns np.nan if either group has no observations or if either false positive rate is 0. |
intersect_acc
¶
Calculate accuracy for an intersectional group.
An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_labels_dict
|
dict
|
Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}). |
required |
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. predictions : list[bool] A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The accuracy of the model in the specified intersectional group. Returns np.nan if the group has no observations. |
intersect_fdr
¶
Calculate false discovery rate for an intersectional group.
An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_labels_dict
|
dict
|
Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}). |
required |
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The false discovery rate of the model in the specified intersectional group. Returns np.nan if the group has no observations. |
intersect_fnr
¶
Calculate false negative rate for an intersectional group.
An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_labels_dict
|
dict
|
Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}). |
required |
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The false negative rate of the model in the specified intersectional group. Returns np.nan if the group has no observations. |
intersect_for
¶
Calculate false omission rate for an intersectional group.
An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_labels_dict
|
dict
|
Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}). |
required |
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The false omission rate of the model in the specified intersectional group. Returns np.nan if the group has no observations. |
intersect_fpr
¶
Calculate false positive rate for an intersectional group.
An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_labels_dict
|
dict
|
Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}). |
required |
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The false positive rate of the model in the specified intersectional group. Returns np.nan if the group has no observations. |
max_intersect_acc_diff
¶
Calculate the maximum difference in accuracy across intersectional groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The maximum difference between any two intersectional group accuracies. Returns np.nan if any group has no observations. |
max_intersect_acc_ratio
¶
Calculate the maximum ratio of accuracies across intersectional groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of the maximum to minimum accuracy across all intersectional groups. Returns np.nan if any group has no observations or if any accuracy is 0. |
max_intersect_fdr_diff
¶
Calculate the maximum difference in false discovery rate across all intersectional groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The difference between the maximum and minimum false discovery rate across all intersectional groups. Returns np.nan if any group has no observations. |
max_intersect_fdr_ratio
¶
Calculate the ratio of the maximum to minimum false discovery rate across all intersectional groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of the maximum to minimum false discovery rate across all intersectional groups. Returns np.nan if any group has no observations or if any false discovery rate is 0. |
max_intersect_fnr_ratio
¶
Calculate the ratio of the maximum to minimum false negative rate across all intersectional groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of the maximum to minimum false negative rate across all intersectional groups. Returns np.nan if any group has no observations or if any false negative rate is 0. |
max_intersect_for_diff
¶
Calculate the maximum difference in false omission rate across all intersectional groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The difference between the maximum and minimum false omission rate across all intersectional groups. Returns np.nan if any group has no observations. |
max_intersect_for_ratio
¶
Calculate the ratio of the maximum to minimum false omission rate across all intersectional groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of the maximum to minimum false omission rate across all intersectional groups. Returns np.nan if any group has no observations or if any false omission rate is 0. |
max_intersect_fpr_diff
¶
Calculate the maximum difference in false positive rate across all intersectional groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The difference between the maximum and minimum false positive rate across all intersectional groups. Returns np.nan if any group has no observations. |
max_intersect_fpr_ratio
¶
Calculate the ratio of the maximum to minimum false positive rate across all intersectional groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_labels_dict
|
dict
|
Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. |
required |
predictions
|
list[bool]
|
A list of predicted diagnoses for each observation in the evaluation dataset. |
required |
true_statuses
|
list[bool]
|
A list of true diagnoses for each observation in the evaluation dataset. |
required |
natural_log
|
bool
|
If True, return the natural logarithm of the ratio. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
The (log) ratio of the maximum to minimum false positive rate across all intersectional groups. Returns np.nan if any group has no observations or if any false positive rate is 0. |
fairness.single_metrics¶
calculate_AOD
¶
Compute the Average Odds Difference (AOD) between demographic groups.
Average Odds Difference measures the average difference in both True Positive Rates (TPR) and False Positive Rates (FPR) between the underprivileged and privileged groups. It captures disparities in model performance for both positive and negative outcomes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_test
|
array-like of shape (n_samples,)
|
Ground-truth binary labels. Expected values: 0 (negative outcome) or 1 (positive outcome). |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted binary labels from a classifier. Expected values: 0 (negative outcome) or 1 (positive outcome). |
required |
group_labels
|
|
required | |
protected
|
Each entry corresponds to the same-indexed sample in y_test and y_pred. |
required | |
privileged_label
|
str
|
The label within group_labels considered to be the privileged group (e.g. 'Male' for sex, 'Older' for age). All other labels are treated as unprivileged. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
AOD |
float
|
Average Odds Difference, defined as:
Values closer to 0 indicate better fairness. |
calculate_DI
¶
Compute Disparate Impact (DI) between demographic groups.
Disparate Impact measures the ratio of positive prediction rates between the underprivileged and privileged groups. It evaluates whether one group receives favorable outcomes less frequently than another, regardless of ground-truth labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_pred
|
array-like of shape (n_samples,)
|
Predicted binary labels from a classifier. Expected values: 0 (negative outcome) or 1 (positive outcome). |
required |
group_labels
|
|
required | |
labels
|
Each entry corresponds to the same-indexed sample in y_test and y_pred. |
required | |
privileged_label
|
str
|
The label within group_labels considered to be the privileged group (e.g. 'Male' for sex, 'Older' for age). All other labels are treated as unprivileged. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DI |
float
|
Disparate Impact, defined as:
where P(ŷ = 1 | group) is the positive prediction rate for the specified group. |
calculate_EOD
¶
Compute the Equal Opportunity Difference (EOD) between demographic groups.
Equal Opportunity Difference measures the absolute difference in True Positive Rates (TPR) between the underprivileged and privileged groups. A lower EOD indicates fairer performance with respect to correctly identifying positive cases across groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_test
|
array-like of shape (n_samples,)
|
Ground-truth binary labels. Expected values: 0 (negative outcome) or 1 (positive outcome). |
required |
y_pred
|
array-like of shape (n_samples,)
|
Predicted binary labels from a classifier. Expected values: 0 (negative outcome) or 1 (positive outcome). |
required |
group_labels
|
|
required | |
labels
|
Each entry corresponds to the same-indexed sample in y_test and y_pred. |
required | |
privileged_label
|
str
|
The label within group_labels considered to be the privileged group (e.g. 'Male' for sex, 'Older' for age). All other labels are treated as unprivileged. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
EOD |
float
|
Equal Opportunity Difference, defined as:
Values closer to 0 indicate better fairness. |
Notes
- EOD focuses exclusively on the positive class (y = 1).
calculate_TPR_TNR_FPR_FNR
¶
Compute classification rate metrics derived from the confusion matrix.
Notes
- Counts must be non-negative integers.
- Label 1 is assumed to be the positive outcome.
calculate_TP_FN_FP_TN
¶
Computes the confusion matrix components: True Positives (TP), False Negatives (FN), True Negatives (TN), and False Positives (FP).
Notes
- Binary classification is assumed.
- Label 1 denotes the positive outcome.
- Label 0 denotes the negative outcome.
group_to_binary
¶
Adapts single fairness functions to the intersectional ones labels: list of group labels (e.g. 'Male', 'Female') privileged_label: label considered privileged returns: numpy array (1 = privileged, 0 = unprivileged)
fairness.visualisation¶
Visualization helpers for fairness metrics.
This module contains lightweight plotting utilities that sit on top of the
fairness.metrics and fairness.single_metrics APIs. The functions do not
compute metrics themselves; they only visualize metric outputs computed from
group labels, predictions, and ground-truth labels.
The typical workflow is:
1) Prepare evaluation inputs (see fairness.groups.make_eval_df and
fairness.adapters).
2) Compute or select a metric function from fairness.metrics or
fairness.single_metrics.
3) Use the plotting helpers here to visualize metric values across groups.
All plotting helpers return a Matplotlib Figure so callers can further
customize or save the plots as needed.
plot_group_metric
¶
Plot a group-level metric computed with fairness.metrics (group_*).
This function expects a metric that takes a single group label, a list
of subject labels, predictions, and true labels, and returns a scalar
value for that group (e.g., group_acc, group_fnr, group_fpr).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_fn
|
callable
|
A function from |
required |
subject_labels
|
Iterable
|
Group label for each sample (e.g., intersectional labels). |
required |
predictions
|
Iterable
|
Predicted labels aligned with |
required |
true_statuses
|
Iterable
|
Ground-truth labels aligned with |
required |
groups
|
Sequence or None
|
Subset/ordering of groups to plot. If None, all unique labels are used. |
None
|
title
|
str or None
|
Plot title. Defaults to the metric function name. |
None
|
rotation
|
int
|
Rotation angle for x tick labels. |
45
|
figsize
|
tuple[float, float] or None
|
Figure size in inches. If None, a default size is chosen. |
None
|
sort
|
bool
|
If True, sort bars by metric value (NaNs placed at the end). |
False
|
Returns:
| Type | Description |
|---|---|
Figure
|
The created Matplotlib figure. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs do not share the same length. |
plot_group_metric_from_eval_df
¶
Convenience wrapper for an eval_df produced by
fairness.groups.make_eval_df.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_fn
|
callable
|
A |
required |
eval_df
|
DataFrame
|
DataFrame with columns |
required |
label_col
|
str
|
Column name for group labels (default "subject_label"). |
'subject_label'
|
title
|
str or None
|
Plot title. |
None
|
rotation
|
int
|
Rotation angle for x tick labels. |
45
|
figsize
|
tuple[float, float] or None
|
Figure size in inches. |
None
|
sort
|
bool
|
If True, sort bars by metric value (NaNs placed at the end). |
False
|
Returns:
| Type | Description |
|---|---|
Figure
|
The created Matplotlib figure. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required columns are missing from eval_df. |
plot_intersectional_metric
¶
Plot an all_intersect_* metric from fairness.metrics (dict -> bar plot).
Functions such as all_intersect_accs, all_intersect_fprs, etc. return
a dictionary mapping intersectional group labels to metric values. This
helper converts that dictionary into a horizontal bar plot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_fn
|
callable
|
An |
required |
subject_labels_dict
|
Mapping[str, Sequence]
|
Mapping from protected attribute name to labels per sample. |
required |
predictions
|
Iterable
|
Predicted labels aligned with |
required |
true_statuses
|
Iterable
|
Ground-truth labels aligned with |
required |
title
|
str or None
|
Plot title. Defaults to the metric function name. |
None
|
rotation
|
int
|
Rotation angle for tick labels. |
0
|
figsize
|
tuple[float, float] or None
|
Figure size in inches. |
None
|
sort
|
bool
|
If True, sort bars by metric value (NaNs placed at the end). |
True
|
Returns:
| Type | Description |
|---|---|
Figure
|
The created Matplotlib figure. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If predictions and true_statuses lengths differ. |
TypeError
|
If metric_fn does not return a dictionary. |
plot_pairwise_group_metric
¶
Plot pairwise group metrics (group_diff, group_ratio).
Pairwise metric functions compare two groups at a time and return a scalar (e.g., difference or ratio of accuracies).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_fn
|
callable
|
A function from |
required |
subject_labels
|
Iterable
|
Group label for each sample. |
required |
predictions
|
Iterable
|
Predicted labels aligned with |
required |
true_statuses
|
Iterable
|
Ground-truth labels aligned with |
required |
group_pairs
|
Sequence[tuple] or None
|
Explicit list of (group_a, group_b) pairs to plot. If None, all pairwise combinations of unique groups are used. |
None
|
title
|
str or None
|
Plot title. Defaults to the metric function name. |
None
|
rotation
|
int
|
Rotation angle for x tick labels (used for vertical plots only). |
45
|
figsize
|
tuple[float, float] or None
|
Figure size in inches. |
None
|
sort
|
bool
|
If True, sort bars by metric value (NaNs placed at the end). |
True
|
Returns:
| Type | Description |
|---|---|
Figure
|
The created Matplotlib figure. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no group pairs are provided or generated. |
plot_scalar_metrics
¶
Plot one or more scalar metrics (e.g., max_intersect_* outputs).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metrics
|
Mapping[str, float]
|
Mapping from metric name to scalar value. |
required |
title
|
str or None
|
Plot title. |
None
|
rotation
|
int
|
Rotation angle for x tick labels. |
0
|
figsize
|
tuple[float, float] or None
|
Figure size in inches. |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
The created Matplotlib figure. |
plot_single_metrics
¶
Plot single-attribute fairness metrics from fairness.single_metrics.
This helper computes and visualizes metrics such as EOD, AOD, and DI for a single protected attribute with a specified privileged group. Note that DI uses only predictions, while EOD and AOD require y_test.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_test
|
Iterable
|
Ground-truth binary labels (0/1). |
required |
y_pred
|
Iterable
|
Predicted binary labels (0/1). |
required |
group_labels
|
Iterable
|
Protected attribute labels aligned to y_test/y_pred. |
required |
privileged_label
|
object
|
Label treated as the privileged group. |
required |
metrics
|
Sequence[str] or None
|
Subset of {"EOD", "AOD", "DI"} to compute. Defaults to all. |
None
|
title
|
str or None
|
Plot title. |
None
|
rotation
|
int
|
Rotation angle for x tick labels. |
0
|
figsize
|
tuple[float, float] or None
|
Figure size in inches. |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
The created Matplotlib figure. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an unknown metric name is requested. |