0

In the code below, I have included 5 records for reproducibility. Most of the parameters that I am using are directly from the source code example of instantiated the COMPAS dataset, but I cannot convert the DataFrame into a StandardDataset as it raises a KeyError on 'sex' in the protected_attribute_names. I can use any other column name in that parameter list and I still will end up with the same KeyError (race, for example. I also tried an integer in case it was looking at row information. Still raised the key error).

Python: 3.8.10
Pandas: 1.3.5
AIF360: 0.5.0

WARNING:root:Missing Data: 2 rows removed from StandardDataset.
-

KeyError                                  Traceback (most recent call last)
[/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
3360             try:
\-\> 3361                 return self.\_engine.get_loc(casted_key)
3362             except KeyError as err:

5 frames
/usr/local/lib/python3.8/dist-packages/pandas/\_libs/index.pyx in pandas.\_libs.index.IndexEngine.get_loc()

/usr/local/lib/python3.8/dist-packages/pandas/\_libs/index.pyx in pandas.\_libs.index.IndexEngine.get_loc()

pandas/\_libs/hashtable_class_helper.pxi in pandas.\_libs.hashtable.PyObjectHashTable.get_item()

pandas/\_libs/hashtable_class_helper.pxi in pandas.\_libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'race'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
[\<ipython-input-103-af5cab624a37\>](https://localhost:8080/#) in \<module\>
1 from aif360.datasets import StandardDataset
2
\----\> 3 aif = StandardDataset(test_df,
4                       label_name='jail7',
5                       favorable_classes=\[0\],

[/usr/local/lib/python3.8/dist-packages/aif360/datasets/standard_dataset.py](https://localhost:8080/#) in __init__(self, df, label_name, favorable_classes, protected_attribute_names, privileged_classes, instance_weights_name, scores_name, categorical_features, features_to_keep, features_to_drop, na_values, custom_preprocessing, metadata)
113             if callable(vals):
114                 df\[attr\] = df\[attr\].apply(vals)
\--\> 115             elif np.issubdtype(df\[attr\].dtype, np.number):
116                 # this attribute is numeric; no remapping needed
117                 privileged_values = vals

[/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py](https://localhost:8080/#) in __getitem__(self, key)
3456             if self.columns.nlevels \> 1:
3457                 return self.\_getitem_multilevel(key)
\-\> 3458             indexer = self.columns.get_loc(key)
3459             if is_integer(indexer):
3460                 indexer = \[indexer\]

[/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
3361                 return self.\_engine.get_loc(casted_key)
3362             except KeyError as err:
\-\> 3363                 raise KeyError(key) from err
3364
3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'race'
import pandas as pd
from aif360.datasets import StandardDataset

data = {'age': {0: 69, 1: 34, 2: 24, 3: 23, 4: 43}, 
        'age_cat': {0: 'Greater than 45',  1: '25 - 45',  2: 'Less than 25',  3: 'Less than 25',  4: '25 - 45'}, 
        'sex': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}, 
        'race': {0: 'Other',  1: 'African-American',  2: 'African-American',  3: 'African-American',  4: 'Other'}, 
        'c_charge_degree': {0: 'Felony',  1: 'Felony',  2: 'Felony',  3: 'Felony',  4: 'Felony'},
        'priors_count': {0: 0, 1: 0, 2: 4, 3: 1, 4: 2}, 
        'days_b_screening_arrest': {0: -1.0, 1: -1.0, 2: -1.0, 3: nan, 4: nan}, 
        'decile_score': {0: 1, 1: 3, 2: 4, 3: 8, 4: 1}, 
        'score_text': {0: 'Low', 1: 'Low', 2: 'Low', 3: 'High', 4: 'Low'}, 
        'is_recid': {0: 0, 1: 1, 2: 1, 3: 0, 4: 0}, 
        'two_year_recid': {0: 0, 1: 1, 2: 1, 3: 0, 4: 0}, 
        'hours_in_jail': {0: 23.627222222222223,  1: 241.85722222222222,  2: 26.058333333333334,  3: nan,  4: nan}, 
        'jail7': {0: False, 1: False, 2: False, 3: True, 4: False}}

df = pd.DataFrame.from_dict(data)

aif = StandardDataset(df, 
                      label_name='jail7', 
                      favorable_classes=[0], 
                      protected_attribute_names=['sex', 'race'], 
                      privileged_classes=[['Female'], ['Caucasian']], 
                      categorical_features=['age_cat', 'sex', 'c_charge_degree', 'score_text', 'race'],    
                      features_to_keep=['age', 'age_cat', 'sex', 'race', 'c_charge_degree', 'priors_count', 'days_b_screening_arrest', 'decile_score', 'score_text', 'is_recid', 'two_year_recid', 'hours_in_jail', 'jail7'])

I changed the values within the protected_attribute names, tried to reduce the length of the list from 2 down to 1. Tried to parse it without values (they're required).

0 Answers0