In the code below, I have included 5 records for reproducibility. Most of the parameters that I am using are directly from the source code example of instantiated the COMPAS dataset, but I cannot convert the DataFrame into a StandardDataset as it raises a KeyError on 'sex' in the protected_attribute_names. I can use any other column name in that parameter list and I still will end up with the same KeyError (race, for example. I also tried an integer in case it was looking at row information. Still raised the key error).
Python: 3.8.10
Pandas: 1.3.5
AIF360: 0.5.0
WARNING:root:Missing Data: 2 rows removed from StandardDataset.
-
KeyError Traceback (most recent call last)
[/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
3360 try:
\-\> 3361 return self.\_engine.get_loc(casted_key)
3362 except KeyError as err:
5 frames
/usr/local/lib/python3.8/dist-packages/pandas/\_libs/index.pyx in pandas.\_libs.index.IndexEngine.get_loc()
/usr/local/lib/python3.8/dist-packages/pandas/\_libs/index.pyx in pandas.\_libs.index.IndexEngine.get_loc()
pandas/\_libs/hashtable_class_helper.pxi in pandas.\_libs.hashtable.PyObjectHashTable.get_item()
pandas/\_libs/hashtable_class_helper.pxi in pandas.\_libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'race'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
[\<ipython-input-103-af5cab624a37\>](https://localhost:8080/#) in \<module\>
1 from aif360.datasets import StandardDataset
2
\----\> 3 aif = StandardDataset(test_df,
4 label_name='jail7',
5 favorable_classes=\[0\],
[/usr/local/lib/python3.8/dist-packages/aif360/datasets/standard_dataset.py](https://localhost:8080/#) in __init__(self, df, label_name, favorable_classes, protected_attribute_names, privileged_classes, instance_weights_name, scores_name, categorical_features, features_to_keep, features_to_drop, na_values, custom_preprocessing, metadata)
113 if callable(vals):
114 df\[attr\] = df\[attr\].apply(vals)
\--\> 115 elif np.issubdtype(df\[attr\].dtype, np.number):
116 # this attribute is numeric; no remapping needed
117 privileged_values = vals
[/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py](https://localhost:8080/#) in __getitem__(self, key)
3456 if self.columns.nlevels \> 1:
3457 return self.\_getitem_multilevel(key)
\-\> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = \[indexer\]
[/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
3361 return self.\_engine.get_loc(casted_key)
3362 except KeyError as err:
\-\> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'race'
import pandas as pd
from aif360.datasets import StandardDataset
data = {'age': {0: 69, 1: 34, 2: 24, 3: 23, 4: 43},
'age_cat': {0: 'Greater than 45', 1: '25 - 45', 2: 'Less than 25', 3: 'Less than 25', 4: '25 - 45'},
'sex': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'race': {0: 'Other', 1: 'African-American', 2: 'African-American', 3: 'African-American', 4: 'Other'},
'c_charge_degree': {0: 'Felony', 1: 'Felony', 2: 'Felony', 3: 'Felony', 4: 'Felony'},
'priors_count': {0: 0, 1: 0, 2: 4, 3: 1, 4: 2},
'days_b_screening_arrest': {0: -1.0, 1: -1.0, 2: -1.0, 3: nan, 4: nan},
'decile_score': {0: 1, 1: 3, 2: 4, 3: 8, 4: 1},
'score_text': {0: 'Low', 1: 'Low', 2: 'Low', 3: 'High', 4: 'Low'},
'is_recid': {0: 0, 1: 1, 2: 1, 3: 0, 4: 0},
'two_year_recid': {0: 0, 1: 1, 2: 1, 3: 0, 4: 0},
'hours_in_jail': {0: 23.627222222222223, 1: 241.85722222222222, 2: 26.058333333333334, 3: nan, 4: nan},
'jail7': {0: False, 1: False, 2: False, 3: True, 4: False}}
df = pd.DataFrame.from_dict(data)
aif = StandardDataset(df,
label_name='jail7',
favorable_classes=[0],
protected_attribute_names=['sex', 'race'],
privileged_classes=[['Female'], ['Caucasian']],
categorical_features=['age_cat', 'sex', 'c_charge_degree', 'score_text', 'race'],
features_to_keep=['age', 'age_cat', 'sex', 'race', 'c_charge_degree', 'priors_count', 'days_b_screening_arrest', 'decile_score', 'score_text', 'is_recid', 'two_year_recid', 'hours_in_jail', 'jail7'])
I changed the values within the protected_attribute names, tried to reduce the length of the list from 2 down to 1. Tried to parse it without values (they're required).