Calculate group fairness metrics with AIF360

Question

I want to calculate group fairness metrics using AIF360. This is a sample dataset and model, in which gender is the protected attribute and income is the target.

import pandas as pd
from sklearn.svm import SVC
from aif360.sklearn import metrics

df = pd.DataFrame({'gender': [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
                  'experience': [0, 0.1, 0.2, 0.4, 0.5, 0.6, 0, 0.1, 0.2, 0.4, 0.5, 0.6],
                  'income': [0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1]})

clf = SVC(random_state=0).fit(df[['gender', 'experience']], df['income'])

y_pred = clf.predict(df[['gender', 'experience']])

metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)

It throws out:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-609692e52b2a> in <module>
     11 y_pred = clf.predict(X)
     12 
---> 13 metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)

TypeError: statistical_parity_difference() got an unexpected keyword argument 'y_true'

Similar error for disparate_impact_ratio. It seems the data needs to be entered differently, but I have not been able to figure out how.

score 3 · Accepted Answer · answered Oct 26 '20 at 18:35

This can be done by transforming the data to a StandardDataset followed by calling the fair_metrics function below:

from aif360.datasets import StandardDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric

dataset = StandardDataset(df, 
                          label_name='income', 
                          favorable_classes=[1], 
                          protected_attribute_names=['gender'], 
                          privileged_classes=[[1]])

def fair_metrics(dataset, y_pred):
    dataset_pred = dataset.copy()
    dataset_pred.labels = y_pred
        
    attr = dataset_pred.protected_attribute_names[0]
    
    idx = dataset_pred.protected_attribute_names.index(attr)
    privileged_groups =  [{attr:dataset_pred.privileged_protected_attributes[idx][0]}] 
    unprivileged_groups = [{attr:dataset_pred.unprivileged_protected_attributes[idx][0]}] 

    classified_metric = ClassificationMetric(dataset, dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)

    metric_pred = BinaryLabelDatasetMetric(dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)

    result = {'statistical_parity_difference': metric_pred.statistical_parity_difference(),
             'disparate_impact': metric_pred.disparate_impact(),
             'equal_opportunity_difference': classified_metric.equal_opportunity_difference()}
        
    return result


fair_metrics(dataset, y_pred)

which returns the correct results (image ref):

{'statistical_parity_difference': -0.6666666666666667,
 'disparate_impact': 0.3333333333333333,
 'equal_opportunity_difference': 0.0}

Bill Huang · Answer 2 · 2020-10-23T22:47:39.763

1

Remove the y_true= and y_pred= characters in the function call and retry. As one can see in the documentation, *y within the function prototype stands for arbitrary number of arguments (see this post). So this is the most logical guess.

In other words, y_true and y_pred are NOT keyword arguments. So they cannot be passed with their names. Keyword arguments are expressed as **kwargs within a function prototype.

edited Oct 23 '20 at 22:47

answered Oct 23 '20 at 20:47

Bill Huang

4,491
2
13
31

Thanks. It resolved the current error, but now it throws `ValueError: Some of the attributes provided are not present in the dataset`, which makes sense given the `df ["gender"]` is not provided to the function. – Reveille Oct 23 '20 at 21:03
1

I'd bet the problem is now in the data, as the error message is now a ValueError on the dataset property. It is now unrelated to the function call itself. – Bill Huang Oct 23 '20 at 22:49

score 0 · Answer 3 · answered Dec 29 '22 at 23:08

I had the same problem. The y_pred_default was array type and the whole dataset was Dataframe. But if you convert the y_pred_default to dataframe you will lose the order of the values and as a result it will show nan values to the new dataset. So i converted the dataset to numpy array, then concat with the y_pred_default array and convert to dataframe. Also you have to change the column names as they were first because now there are numbers. By doing this you have exactly what you want. A dataframe with your x values and the corresponding y predicted values in order to count the spd metric.

Calculate group fairness metrics with AIF360

3 Answers3

Linked