Skip to content

API reference

fairness.data

fairness.data

Data loading utilities used by the fairness toolkit.

This module loads tabular datasets into a pandas DataFrame, while preserving row order and/or indices so that downstream steps can guarantee alignment between:

  • model predictions (y_pred)
  • true labels (y_test)
  • protected attributes used to construct intersectional groups

Dataset-specific logic (e.g., mapping target labels, binning ages, cleaning special missing-value encodings such as '?') should live in small adapter functions.

Typical usage

from fairness.data import load_csv, load_features_and_target df = load_csv("data/heart.csv") X, y = load_features_and_target(df, target_col="HeartDisease")

load_csv

Load a CSV file into a pandas DataFrame.

The CSV may be provided either as a local file path or as a URL (e.g. an HTTP(S) link to a raw CSV file).

Parameters:

Name Type Description Default
path PathLike

Path or URL to the CSV file.

required
index_col Optional[Union[int, str]]

Column to use as the row index (passed to pandas.read_csv). If None, pandas uses a default integer index.

None
na_values Optional[Union[str, Sequence[str]]]

Additional strings to recognise as NA/NaN.

None

Returns:

Type Description
DataFrame

The dataset as a DataFrame.

Raises:

Type Description
FileNotFoundError

If a local file path does not exist.

ValueError

If the loaded CSV is empty.

load_features_and_target

Split a DataFrame into features X and target y.

Parameters:

Name Type Description Default
df DataFrame

Full dataset containing both features and target.

required
target_col str

Name of the target column.

required
drop_cols Sequence[str]

Additional columns to drop from X (e.g., derived protected attributes used only for fairness analysis such as 'age_group').

()

Returns:

Type Description
(X, y):

X is a DataFrame of features, y is a Series of labels.

Raises:

Type Description
ValueError

If target_col is not in df, or if resulting X is empty.

load_heart_csv

Load the Heart Disease CSV used in the tutorial.

This is a wrapper around load_csv()

Parameters:

Name Type Description Default
path PathLike

Path to heart.csv.

required
target_col str

Expected target column name (used for validation).

'HeartDisease'

Returns:

Type Description
DataFrame

Loaded dataset.

Raises:

Type Description
ValueError

If the expected target column is missing.

validate_columns

Validate that required columns exist in the DataFrame.

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame.

required
required Iterable[str]

Column names that must be present.

required

Raises:

Type Description
ValueError

If any required column is missing.

fairness.preprocess

fairness.preprocess

Preprocessing utilities for tabular datasets used in fairness analysis.

This module includes: - feature engineering (e.g., binning age into age_group) - converting raw tabular data into numeric features suitable for ML - producing reproducible train/test splits while preserving indices

Design notes
  • The toolkit is model-agnostic: these functions do not require sklearn pipelines, but they produce outputs compatible with sklearn and similar libraries.
  • Protected attributes may be used for fairness analysis even if they are excluded from model training. Derived protected attributes (e.g. age_group) are excluded from model inputs.
Typical usage

from fairness.data import load_csv from fairness.preprocess import add_age_group, preprocess_tabular, make_train_test_split df = load_csv("data/heart.csv") df = add_age_group(df) df_model = preprocess_tabular(df) split = make_train_test_split(df_model, target_col="HeartDisease", drop_cols=("age_group",))

SplitData dataclass

Container for a reproducible train/test split.

Attributes:

Name Type Description
X_train, X_test

Feature matrices for training and testing.

y_train, y_test

Target vectors for training and testing.

add_age_group

Add a categorical age-group column derived from a continuous age column.

This is useful for fairness analysis because continuous protected attributes (like age) create too many groups; binning yields interpretable groups.

Parameters:

Name Type Description Default
df DataFrame

Input dataset.

required
age_col str

Name of the column containing numeric ages.

'Age'
new_col str

Name of the derived categorical column to create.

'age_group'
bins Sequence[float]

Bin edges passed to pandas.cut.

(0, 55, 120)
labels Sequence[str]

Labels assigned to the bins.

('young', 'older')

Returns:

Type Description
DataFrame

Copy of df with the new categorical column added.

Raises:

Type Description
ValueError

If age_col is missing or binning produces missing values.

apply_transforms

Apply a sequence of DataFrame -> DataFrame transforms in order.

Parameters:

Name Type Description Default
df DataFrame

Input dataset.

required
transforms Sequence[Callable[[DataFrame], DataFrame]]

Sequence of callables each returning a modified DataFrame.

required

Returns:

Type Description
DataFrame

Transformed DataFrame.

make_train_test_split

Create a reproducible train/test split for modelling.

Parameters:

Name Type Description Default
df DataFrame

Preprocessed dataset containing features and target.

required
target_col str

Name of the target column.

required
drop_cols Sequence[str]

Additional columns to exclude from X (e.g. derived protected attributes).

()
test_size float

Fraction of rows assigned to the test set.

0.3
random_state int

Random seed for reproducibility.

42
stratify bool

If True, stratify split by the target to preserve class balance.

True

Returns:

Type Description
SplitData

Container holding X_train, X_test, y_train, y_test.

Raises:

Type Description
ValueError

If target_col is missing or df is empty.

map_binary_column

Map values of a binary/categorical column to new values (e.g., 'M'/'F' -> 1/0).

Parameters:

Name Type Description Default
df DataFrame

Input dataset.

required
col str

Column name to map.

required
mapping Mapping[object, object]

Dictionary defining how to map values.

required
strict bool

If True, raise if unmapped values occur. If False, leave unmapped as-is.

True

Returns:

Type Description
DataFrame

Copy of df with mapped column.

Raises:

Type Description
ValueError

If strict=True and unmapped values are found.

preprocess_tabular

Convert a tabular DataFrame into numeric ML-ready features.

Performs one-hot encoding for categorical columns (object/category) and leaves numeric columns unchanged.

Parameters:

Name Type Description Default
df DataFrame

Input dataset.

required
drop_cols Sequence[str]

Columns to drop prior to encoding

()
one_hot bool

Whether to one-hot encode categorical columns.

True
drop_first bool

If one_hot=True, drop the first level for each categorical variable to avoid perfect multicollinearity in logistic regression models.

True

Returns:

Type Description
DataFrame

A numeric DataFrame compatible with scikit-learn.

fairness.groups

fairness.groups

Minimal utilities for constructing intersectional group labels and producing an evaluation DataFrame aligned with model predictions and true labels.

Primary output: a tidy DataFrame with columns: - subject_label (intersectional group label per individual) - y_pred (model prediction) - y_true (true label)

make_eval_df

Build an evaluation DataFrame for group-based metric functions.

The handoff format for metrics such as accuracy_diff:

subject_labels = eval_df[label_col].tolist()
predictions    = eval_df["y_pred"].tolist()
true_statuses  = eval_df["y_true"].tolist()

Parameters:

Name Type Description Default
df_test DataFrame

Test-set DataFrame in the SAME row order as y_pred and y_true (typically df.loc[split.X_test.index]).

required
protected Sequence[str]

Protected columns used to define intersectional groups.

required
y_pred Sequence

Model predictions aligned to df_test rows.

required
y_true Sequence

True labels aligned to df_test rows.

required
label_col str

Name of the intersectional label column.

'subject_label'

Returns:

Type Description
DataFrame

Columns: subject_label, y_pred, y_true (index preserved).

make_intersectional_labels

Create an intersectional group label for each row of df.

Example: Sex=1|age_group=older

Parameters:

Name Type Description Default
df DataFrame

DataFrame containing protected columns.

required
protected Sequence[str]

Column names to intersect (order defines label format).

required
sep str

Formatting separators for the label.

'|'
kv_sep str

Formatting separators for the label.

'|'
missing str

Placeholder for missing values.

'NA'

Returns:

Type Description
list[str]

One label per row, aligned with df.

fairness.adapters

make_subject_labels_dict

Build the dict-of-lists format expected by intersect_* functions.

Parameters:

Name Type Description Default
df_test DataFrame

Test-set DataFrame containing the protected columns.

required
protected_cols list[str]

E.g. ["Sex", "age_group"]

required

Returns:

Type Description
dict[str, list]

{col: list_of_values_aligned_rowwise_with_eval_df}

unpack_eval_df

Convert eval_df into the list inputs expected by group_* metric functions.

Expects eval_df columns: - subject_label (str) - y_pred (0/1) - y_true (0/1)

Returns:

Name Type Description
subject_labels list[str]
predictions list[int]
true_statuses list[int]

fairness.metrics

all_intersect_accs

Calculate accuracies for all possible intersectional groups.

Computes accuracy for every combination of categories in the dataset (e.g., all age-group-gender combinations).

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
dict

Dictionary mapping intersectional group names (formatted as "label1 + label2 + ...") to their respective accuracies.

all_intersect_fdrs

Calculate false discovery rates for all possible intersectional groups.

Computes false discovery rate for every combination of categories in the dataset (e.g., all age-group-gender combinations).

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
dict

Dictionary mapping intersectional group names (as strings with ' + ' separating categories) to their false discovery rates.

all_intersect_fnrs

Calculate false negative rates for all possible intersectional groups.

Computes false negative rate for every combination of categories in the dataset (e.g., all age-group-gender combinations).

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
dict

Dictionary mapping intersectional group names (as strings with ' + ' separating categories) to their false negative rates.

all_intersect_fors

Calculate false omission rates for all possible intersectional groups.

Computes false omission rate for every combination of categories in the dataset (e.g., all age-group-gender combinations).

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
dict

Dictionary mapping intersectional group names (as strings with ' + ' separating categories) to their false omission rates.

all_intersect_fprs

Calculate false positive rates for all possible intersectional groups.

Computes false positive rate for every combination of categories in the dataset (e.g., all age-group-gender combinations).

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
dict

Dictionary mapping intersectional group names (as strings with ' + ' separating categories) to their false positive rates.

group_acc

Find the accuracy of a group with a specific label.

Parameters:

Name Type Description Default
group_label str or int

The label of the group for which the accuracy of the model should be evaluated.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The accuracy of the model in the specified group. Returns np.nan if the group has no observations.

group_acc_diff

Calculate the absolute difference in accuracy between two groups.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The absolute difference in accuracy between the two groups. Returns np.nan if either group has no observations.

group_acc_ratio

Calculate the ratio of accuracies between two groups.

Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of accuracies between the two groups. Returns np.nan if either group has no observations or if either accuracy is 0.

group_fdr

Find the false discovery rate of a group with a specific label.

Parameters:

Name Type Description Default
group_label str or int

The label of the group for which the false discovery rate of the model should be evaluated.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The false discovery rate of the model in the specified group. Returns np.nan if the group has no observations.

group_fdr_diff

Calculate the absolute difference in false discovery rate between two groups.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The absolute difference in false discovery rate between the two groups. Returns np.nan if either group has no observations.

group_fdr_ratio

Calculate the ratio of false discovery rates between two groups.

Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of false discovery rates between the two groups. Returns np.nan if either group has no observations or if either false discovery rate is 0.

group_fnr

Find the false negative rate of a group with a specific label.

Parameters:

Name Type Description Default
group_label str or int

The label of the group for which the false negative rate of the model should be evaluated.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The false negative rate of the model in the specified group. Returns np.nan if the group has no observations.

group_fnr_diff

Calculate the absolute difference in false negative rate between two groups.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The absolute difference in false negative rate between the two groups. Returns np.nan if either group has no observations.

group_fnr_ratio

Calculate the ratio of false negative rates between two groups.

Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of false negative rates between the two groups. Returns np.nan if either group has no observations or if either false negative rate is 0.

group_for

Find the false omission rate of a group with a specific label.

Parameters:

Name Type Description Default
group_label str or int

The label of the group for which the false omission rate of the model should be evaluated.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The false omission rate of the model in the specified group. Returns np.nan if the group has no observations.

group_for_diff

Calculate the absolute difference in false omission rate between two groups.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The absolute difference in false omission rate between the two groups. Returns np.nan if either group has no observations.

group_for_ratio

Calculate the ratio of false omission rates between two groups.

Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of false omission rates between the two groups. Returns np.nan if either group has no observations or if either false omission rate is 0.

group_fpr

Find the false positive rate of a group with a specific label.

Parameters:

Name Type Description Default
group_label str or int

The label of the group for which the false positive rate of the model should be evaluated.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The false positive rate of the model in the specified group. Returns np.nan if the group has no observations.

group_fpr_diff

Calculate the absolute difference in false positive rate between two groups.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The absolute difference in false positive rate between the two groups. Returns np.nan if either group has no observations.

group_fpr_ratio

Calculate the ratio of false positive rates between two groups.

Computes the maximum of the two possible ratios (group A / group B and group B / group A) to ensure the ratio is always >= 1.

Parameters:

Name Type Description Default
group_a_label str or int

The label of the first group.

required
group_b_label str or int

The label of the second group.

required
subject_labels dict

A dictionary containing subject labels for every observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of false positive rates between the two groups. Returns np.nan if either group has no observations or if either false positive rate is 0.

intersect_acc

Calculate accuracy for an intersectional group.

An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).

Parameters:

Name Type Description Default
group_labels_dict dict

Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}).

required
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset. predictions : list[bool] A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The accuracy of the model in the specified intersectional group. Returns np.nan if the group has no observations.

intersect_fdr

Calculate false discovery rate for an intersectional group.

An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).

Parameters:

Name Type Description Default
group_labels_dict dict

Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}).

required
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The false discovery rate of the model in the specified intersectional group. Returns np.nan if the group has no observations.

intersect_fnr

Calculate false negative rate for an intersectional group.

An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).

Parameters:

Name Type Description Default
group_labels_dict dict

Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}).

required
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The false negative rate of the model in the specified intersectional group. Returns np.nan if the group has no observations.

intersect_for

Calculate false omission rate for an intersectional group.

An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).

Parameters:

Name Type Description Default
group_labels_dict dict

Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}).

required
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The false omission rate of the model in the specified intersectional group. Returns np.nan if the group has no observations.

intersect_fpr

Calculate false positive rate for an intersectional group.

An intersectional group is defined by membership in specific categories across multiple dimensions (e.g., specific age category and specific gender).

Parameters:

Name Type Description Default
group_labels_dict dict

Dictionary mapping category names to specific group labels that define the intersectional group (e.g., {'age': 'Older', 'gender': 'Female'}).

required
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The false positive rate of the model in the specified intersectional group. Returns np.nan if the group has no observations.

max_intersect_acc_diff

Calculate the maximum difference in accuracy across intersectional groups.

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The maximum difference between any two intersectional group accuracies. Returns np.nan if any group has no observations.

max_intersect_acc_ratio

Calculate the maximum ratio of accuracies across intersectional groups.

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of the maximum to minimum accuracy across all intersectional groups. Returns np.nan if any group has no observations or if any accuracy is 0.

max_intersect_fdr_diff

Calculate the maximum difference in false discovery rate across all intersectional groups.

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The difference between the maximum and minimum false discovery rate across all intersectional groups. Returns np.nan if any group has no observations.

max_intersect_fdr_ratio

Calculate the ratio of the maximum to minimum false discovery rate across all intersectional groups.

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of the maximum to minimum false discovery rate across all intersectional groups. Returns np.nan if any group has no observations or if any false discovery rate is 0.

max_intersect_fnr_ratio

Calculate the ratio of the maximum to minimum false negative rate across all intersectional groups.

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of the maximum to minimum false negative rate across all intersectional groups. Returns np.nan if any group has no observations or if any false negative rate is 0.

max_intersect_for_diff

Calculate the maximum difference in false omission rate across all intersectional groups.

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The difference between the maximum and minimum false omission rate across all intersectional groups. Returns np.nan if any group has no observations.

max_intersect_for_ratio

Calculate the ratio of the maximum to minimum false omission rate across all intersectional groups.

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of the maximum to minimum false omission rate across all intersectional groups. Returns np.nan if any group has no observations or if any false omission rate is 0.

max_intersect_fpr_diff

Calculate the maximum difference in false positive rate across all intersectional groups.

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required

Returns:

Type Description
float

The difference between the maximum and minimum false positive rate across all intersectional groups. Returns np.nan if any group has no observations.

max_intersect_fpr_ratio

Calculate the ratio of the maximum to minimum false positive rate across all intersectional groups.

Parameters:

Name Type Description Default
subject_labels_dict dict

Dictionary mapping category names to lists of labels for each observation in the evaluation dataset.

required
predictions list[bool]

A list of predicted diagnoses for each observation in the evaluation dataset.

required
true_statuses list[bool]

A list of true diagnoses for each observation in the evaluation dataset.

required
natural_log bool

If True, return the natural logarithm of the ratio. Default is True.

True

Returns:

Type Description
float

The (log) ratio of the maximum to minimum false positive rate across all intersectional groups. Returns np.nan if any group has no observations or if any false positive rate is 0.

fairness.single_metrics

calculate_AOD

Compute the Average Odds Difference (AOD) between demographic groups.

Average Odds Difference measures the average difference in both True Positive Rates (TPR) and False Positive Rates (FPR) between the underprivileged and privileged groups. It captures disparities in model performance for both positive and negative outcomes.

Parameters:

Name Type Description Default
y_test array-like of shape (n_samples,)

Ground-truth binary labels. Expected values: 0 (negative outcome) or 1 (positive outcome).

required
y_pred array-like of shape (n_samples,)

Predicted binary labels from a classifier. Expected values: 0 (negative outcome) or 1 (positive outcome).

required
group_labels
required
protected

Each entry corresponds to the same-indexed sample in y_test and y_pred.

required
privileged_label str

The label within group_labels considered to be the privileged group (e.g. 'Male' for sex, 'Older' for age). All other labels are treated as unprivileged.

required

Returns:

Name Type Description
AOD float

Average Odds Difference, defined as:

AOD = 0.5 × [ (FPR_underprivileged − FPR_privileged)
            + (TPR_underprivileged − TPR_privileged) ]

Values closer to 0 indicate better fairness.

calculate_DI

Compute Disparate Impact (DI) between demographic groups.

Disparate Impact measures the ratio of positive prediction rates between the underprivileged and privileged groups. It evaluates whether one group receives favorable outcomes less frequently than another, regardless of ground-truth labels.

Parameters:

Name Type Description Default
y_pred array-like of shape (n_samples,)

Predicted binary labels from a classifier. Expected values: 0 (negative outcome) or 1 (positive outcome).

required
group_labels
required
labels

Each entry corresponds to the same-indexed sample in y_test and y_pred.

required
privileged_label str

The label within group_labels considered to be the privileged group (e.g. 'Male' for sex, 'Older' for age). All other labels are treated as unprivileged.

required

Returns:

Name Type Description
DI float

Disparate Impact, defined as:

DI = P(ŷ = 1 | underprivileged) / P(ŷ = 1 | privileged)

where P(ŷ = 1 | group) is the positive prediction rate for the specified group.

calculate_EOD

Compute the Equal Opportunity Difference (EOD) between demographic groups.

Equal Opportunity Difference measures the absolute difference in True Positive Rates (TPR) between the underprivileged and privileged groups. A lower EOD indicates fairer performance with respect to correctly identifying positive cases across groups.

Parameters:

Name Type Description Default
y_test array-like of shape (n_samples,)

Ground-truth binary labels. Expected values: 0 (negative outcome) or 1 (positive outcome).

required
y_pred array-like of shape (n_samples,)

Predicted binary labels from a classifier. Expected values: 0 (negative outcome) or 1 (positive outcome).

required
group_labels
required
labels

Each entry corresponds to the same-indexed sample in y_test and y_pred.

required
privileged_label str

The label within group_labels considered to be the privileged group (e.g. 'Male' for sex, 'Older' for age). All other labels are treated as unprivileged.

required

Returns:

Name Type Description
EOD float

Equal Opportunity Difference, defined as:

EOD = |TPR_underprivileged − TPR_privileged|

Values closer to 0 indicate better fairness.

Notes
  • EOD focuses exclusively on the positive class (y = 1).

calculate_TPR_TNR_FPR_FNR

Compute classification rate metrics derived from the confusion matrix.

Notes
  • Counts must be non-negative integers.
  • Label 1 is assumed to be the positive outcome.

calculate_TP_FN_FP_TN

Computes the confusion matrix components: True Positives (TP), False Negatives (FN), True Negatives (TN), and False Positives (FP).

Notes
  • Binary classification is assumed.
  • Label 1 denotes the positive outcome.
  • Label 0 denotes the negative outcome.

group_to_binary

Adapts single fairness functions to the intersectional ones labels: list of group labels (e.g. 'Male', 'Female') privileged_label: label considered privileged returns: numpy array (1 = privileged, 0 = unprivileged)

fairness.visualisation

Visualization helpers for fairness metrics.

This module contains lightweight plotting utilities that sit on top of the fairness.metrics and fairness.single_metrics APIs. The functions do not compute metrics themselves; they only visualize metric outputs computed from group labels, predictions, and ground-truth labels.

The typical workflow is: 1) Prepare evaluation inputs (see fairness.groups.make_eval_df and fairness.adapters). 2) Compute or select a metric function from fairness.metrics or fairness.single_metrics. 3) Use the plotting helpers here to visualize metric values across groups.

All plotting helpers return a Matplotlib Figure so callers can further customize or save the plots as needed.

plot_group_metric

Plot a group-level metric computed with fairness.metrics (group_*).

This function expects a metric that takes a single group label, a list of subject labels, predictions, and true labels, and returns a scalar value for that group (e.g., group_acc, group_fnr, group_fpr).

Parameters:

Name Type Description Default
metric_fn callable

A function from fairness.metrics with signature: (group_label, subject_labels, predictions, true_statuses) -> float.

required
subject_labels Iterable

Group label for each sample (e.g., intersectional labels).

required
predictions Iterable

Predicted labels aligned with subject_labels.

required
true_statuses Iterable

Ground-truth labels aligned with subject_labels.

required
groups Sequence or None

Subset/ordering of groups to plot. If None, all unique labels are used.

None
title str or None

Plot title. Defaults to the metric function name.

None
rotation int

Rotation angle for x tick labels.

45
figsize tuple[float, float] or None

Figure size in inches. If None, a default size is chosen.

None
sort bool

If True, sort bars by metric value (NaNs placed at the end).

False

Returns:

Type Description
Figure

The created Matplotlib figure.

Raises:

Type Description
ValueError

If inputs do not share the same length.

plot_group_metric_from_eval_df

Convenience wrapper for an eval_df produced by fairness.groups.make_eval_df.

Parameters:

Name Type Description Default
metric_fn callable

A fairness.metrics group_* function.

required
eval_df DataFrame

DataFrame with columns label_col, y_pred, and y_true.

required
label_col str

Column name for group labels (default "subject_label").

'subject_label'
title str or None

Plot title.

None
rotation int

Rotation angle for x tick labels.

45
figsize tuple[float, float] or None

Figure size in inches.

None
sort bool

If True, sort bars by metric value (NaNs placed at the end).

False

Returns:

Type Description
Figure

The created Matplotlib figure.

Raises:

Type Description
ValueError

If required columns are missing from eval_df.

plot_intersectional_metric

Plot an all_intersect_* metric from fairness.metrics (dict -> bar plot).

Functions such as all_intersect_accs, all_intersect_fprs, etc. return a dictionary mapping intersectional group labels to metric values. This helper converts that dictionary into a horizontal bar plot.

Parameters:

Name Type Description Default
metric_fn callable

An all_intersect_* function with signature: (subject_labels_dict, predictions, true_statuses) -> dict.

required
subject_labels_dict Mapping[str, Sequence]

Mapping from protected attribute name to labels per sample.

required
predictions Iterable

Predicted labels aligned with subject_labels_dict values.

required
true_statuses Iterable

Ground-truth labels aligned with subject_labels_dict values.

required
title str or None

Plot title. Defaults to the metric function name.

None
rotation int

Rotation angle for tick labels.

0
figsize tuple[float, float] or None

Figure size in inches.

None
sort bool

If True, sort bars by metric value (NaNs placed at the end).

True

Returns:

Type Description
Figure

The created Matplotlib figure.

Raises:

Type Description
ValueError

If predictions and true_statuses lengths differ.

TypeError

If metric_fn does not return a dictionary.

plot_pairwise_group_metric

Plot pairwise group metrics (group_diff, group_ratio).

Pairwise metric functions compare two groups at a time and return a scalar (e.g., difference or ratio of accuracies).

Parameters:

Name Type Description Default
metric_fn callable

A function from fairness.metrics with signature: (group_a, group_b, subject_labels, predictions, true_statuses) -> float.

required
subject_labels Iterable

Group label for each sample.

required
predictions Iterable

Predicted labels aligned with subject_labels.

required
true_statuses Iterable

Ground-truth labels aligned with subject_labels.

required
group_pairs Sequence[tuple] or None

Explicit list of (group_a, group_b) pairs to plot. If None, all pairwise combinations of unique groups are used.

None
title str or None

Plot title. Defaults to the metric function name.

None
rotation int

Rotation angle for x tick labels (used for vertical plots only).

45
figsize tuple[float, float] or None

Figure size in inches.

None
sort bool

If True, sort bars by metric value (NaNs placed at the end).

True

Returns:

Type Description
Figure

The created Matplotlib figure.

Raises:

Type Description
ValueError

If no group pairs are provided or generated.

plot_scalar_metrics

Plot one or more scalar metrics (e.g., max_intersect_* outputs).

Parameters:

Name Type Description Default
metrics Mapping[str, float]

Mapping from metric name to scalar value.

required
title str or None

Plot title.

None
rotation int

Rotation angle for x tick labels.

0
figsize tuple[float, float] or None

Figure size in inches.

None

Returns:

Type Description
Figure

The created Matplotlib figure.

plot_single_metrics

Plot single-attribute fairness metrics from fairness.single_metrics.

This helper computes and visualizes metrics such as EOD, AOD, and DI for a single protected attribute with a specified privileged group. Note that DI uses only predictions, while EOD and AOD require y_test.

Parameters:

Name Type Description Default
y_test Iterable

Ground-truth binary labels (0/1).

required
y_pred Iterable

Predicted binary labels (0/1).

required
group_labels Iterable

Protected attribute labels aligned to y_test/y_pred.

required
privileged_label object

Label treated as the privileged group.

required
metrics Sequence[str] or None

Subset of {"EOD", "AOD", "DI"} to compute. Defaults to all.

None
title str or None

Plot title.

None
rotation int

Rotation angle for x tick labels.

0
figsize tuple[float, float] or None

Figure size in inches.

None

Returns:

Type Description
Figure

The created Matplotlib figure.

Raises:

Type Description
ValueError

If an unknown metric name is requested.