1

seeing my problem tried to replicate it in a simple example to avoid sharing my data.

A sample pd DF:

df_sample = pd.DataFrame([[1, 2], [3, 4], [5, 6], [float('nan'), 8]], columns=["A", "B"])

Tried calculating mean for all columns using:

df_sample.mean() 

Works well but

df_sample.mode()

doesn't work like mean as seen in output below:

Output:enter image description here

Any ideas why and how can I get mode of all columns using something similar to df.mode()?Btw my purpose is to impute missing data in multiple variables with mode and it didn't replace NaNs with mode in my original data.

df_sample['A'].fillna(df_sample['A'].mode())

But now I realize, seems like mode itself has a problem in definition. Any ideas? Thanks in advance!

Bharat Ram Ammu
  • 174
  • 2
  • 16
  • 2
    Mode is the most occurring value in a list. So, in both of your columns, each value appears only once. So the mode for the columns are the all values. – Scott Boston Jun 25 '19 at 14:43
  • 2
    There is no single mode for your data (right?!). – cs95 Jun 25 '19 at 14:44
  • @cs95 yes realized that. Thanks! – Bharat Ram Ammu Jun 25 '19 at 15:00
  • but @cs95 this question isn't answered by the question marked as duplicate. The question is related to 'mode' method and not using value counts the traditional way. – Bharat Ram Ammu Jun 25 '19 at 15:03
  • You need to scroll past the accepted answer ;) – cs95 Jun 25 '19 at 15:03
  • Just learned fillna method works only with suffix 'iloc[0]' unlike mean as mean is unique and mode usually isn't . 'fillna(df.mode().iloc[0])' is the answer for others. Reference here: https://stackoverflow.com/a/32619781/7027147. @cs95 I suggest to change the duplicate to this answer instead. – Bharat Ram Ammu Jun 25 '19 at 15:05

0 Answers0