Getting error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' when trying to do pandas Smote algorithm

Question

My data is slightly unbalanced, so I am trying to do a SMOTE algorithm before doing the logistic regression model. When I do, I get the error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' Could someone help me figure out why? Here is the code:

X = dummies.loc[:, dummies.columns != 'Count']
y = dummies.loc[:, dummies.columns == 'Count']
#from imblearn.over_sampling import SMOTE
os = SMOTE(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
columns = X_train.columns
os_data_X,os_data_y=os.fit_sample(X_train, y_train) # here is where it errors
os_data_X = pd.DataFrame(data=os_data_X,columns=columns )
os_data_y= pd.DataFrame(data=os_data_y,columns=['Count'])

Thank you!

@QuangHoang thank you for the suggestion, but unfortunately it did not fix my error, since the error was on the fit_sample() line. — devdon, Dec 15 '20 at 18:53

score 17 · Answer 1 · answered Dec 15 '20 at 23:34

17

I just encountered this problem myself. As it turned out, I had a duplicate column in my dataset. Perhaps double check that this is not the case for your dataset.

answered Dec 15 '20 at 23:34

Maxime

171
2

1

Thank you, I just checked if I do and there is not a duplicate column – devdon Dec 16 '20 at 17:58
it was also my case... Double check for duplications of the column names. – Amine Jallouli Jan 06 '21 at 05:29
Same problem and your solution fixed it! Thanks. – user2205916 Jan 20 '21 at 18:03
I had the same problem. Thanks! – igorkf Aug 29 '21 at 01:04

Beta Ways · Answer 2 · 2022-10-01T10:11:25.680

2

This error is mainly due to the fact that you have duplicate columns in your data. To check for duplicate columns, use:

df.head()

or df.columns

To fix, drop columns using:

df.drop('column_name', axis=1, inplace=True)

to drop the duplicated column(s).

edited Oct 01 '22 at 10:11

answered Oct 01 '22 at 10:08

Beta Ways

21
5

score 1 · Answer 3 · answered Dec 16 '20 at 18:11

1

I actually just fixed this problem! I made them matrices: os_data_X,os_data_y=os.fit_sample(X_train.as_matrix(), y_train.as_matrix())

answered Dec 16 '20 at 18:11

devdon

101
1
1
4

1

as_matrix is deprecated for more recent versions of pandas. This thread https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array recommends to_numpy or values. – Evelin Amorim Feb 01 '21 at 17:12

score 1 · Answer 4 · answered Mar 23 '21 at 06:19

1

100% correct solution.

Try to convert your X features into an array first and then feed to SMOTE:

sm = SMOTE()

X=np.array(X)

X, y = sm.fit_sample(X, y.ravel())

answered Mar 23 '21 at 06:19

Muhammad Imran Zaman

131
3
3

Getting error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' when trying to do pandas Smote algorithm

4 Answers4