5

I'm trying to obtain the mode of a column in a groupby object, but I'm getting this error: incompatible index of inserted column with frame index.

This is the line I'm getting this on, and I'm not sure how to fix it. Any help would be appreciated.

dfBitSeq['KMeans'] = df.groupby('OnBitSeq')['KMeans'].apply(lambda x: x.mode())
John
  • 485
  • 3
  • 5
  • 16
  • 1
    Pandas mode returns a data frame unlike mean and median which return a scalar. So you just need to select the slice using x.mode().iloc[0] – Vaishali Feb 06 '18 at 14:58
  • This is exactly what I needed. Can you submit this as an answer, and I'll mark it as accepted? – John Feb 06 '18 at 15:18

3 Answers3

12

Pandas mode returns a data frame unlike mean and median which return a scalar. So you just need to select the slice using x.mode().iloc[0]

dfBitSeq['KMeans'] = df.groupby('OnBitSeq')['KMeans'].apply(lambda x: x.mode().iloc[0])
Vaishali
  • 37,545
  • 5
  • 58
  • 86
2

You can use scipy.stats.mode. Example below.

from scipy.stats import mode

df = pd.DataFrame([[1, 5], [2, 3], [3, 5], [2, 4], [2, 3], [1, 4], [1, 5]],
                  columns=['OnBitSeq', 'KMeans'])

#    OnBitSeq  KMeans
# 0         1       5
# 1         2       3
# 2         3       5
# 3         2       4
# 4         2       3
# 5         1       4
# 6         1       5

modes = df.groupby('OnBitSeq')['KMeans'].apply(lambda x: mode(x)[0][0]).reset_index()

#    OnBitSeq  KMeans
# 0         1       5
# 1         2       3
# 2         3       5

If you need to add this back to the original dataframe:

df['Mode'] = df['OnBitSeq'].map(modes.set_index('OnBitSeq')['KMeans'])
jpp
  • 159,742
  • 34
  • 281
  • 339
0

You could look at Attach a calculated column to an existing dataframe.

This error looks similar and the answer is pretty useful.

Red Panda
  • 43
  • 9