How can .mean() exclude NaN values inside aggregate function?

Question

My Dataset has many columns. Here are two:

Index  Graduated  Age
0      College    24
1      HighSch    18
2      College    26
3      College    Nan
4      HighSch    20

The mean of Age is simple enough:

df.Age.mean()

However, I have many other columns, therefore I'm using agg():

df.groupby('Graduated').agg({'Age':'mean'})

The error I get:

No numeric types to aggregate If I insert a number instead of NaN, it works!!

Does the agg() function not allow us to run the mean if column has NaN values? Is there a way around that?

By the looks of it, it is not the "number" `nan` but instead a string `"Nan"`. Change it to `np.nan` from numpy and it should work. — ayhan, Jul 17 '17 at 00:06
See https://stackoverflow.com/questions/25039328/specifying-skip-na-when-calculating-mean-of-the-column-in-a-data-frame-created for an answer to your particular question not including the `"Nan"` issue mentioned by @ayhan — AGN Gazer, Jul 17 '17 at 00:11

otayeby · Accepted Answer · 2017-07-17T00:29:53.387

2

As @ayhan said, the Nan values look like strings. One possible solution is that you can replace the Nan strings you have with actual NaN values using either of those two lines:

df['Age'] = df['Age'].replace(r'Nan', np.nan, regex=True)

@ayhan's suggestion is to use to_numeric method.

df['Age'] = pd.to_numeric(df['Age'], errors='coerce')

Then execute the aggregation that you mentioned in your question. And I would do the same for all columns to avoid confusion and get things straight from the beginning for analysis purposes in the future.

edited Jul 17 '17 at 00:29

answered Jul 17 '17 at 00:15

otayeby

312
8
17

You might still need `df["age"] = pd.to_numeric(df["age"], errors="coerce")` if the dtype is object. – ayhan Jul 17 '17 at 00:20
I tried it and gave me, `ValueError: Unable to parse string "Nan" at position 3` – otayeby Jul 17 '17 at 00:21
That worked, would you like me to add it to the answer? – otayeby Jul 17 '17 at 00:26
Sure that would be nice. – ayhan Jul 17 '17 at 00:26
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/149341/discussion-between-tiba-and-ayhan). – otayeby Jul 17 '17 at 00:27
That worked. Thank you @ayhan. Should I be worried that I have to use errors='coerce'? would a newer version of pandas make my solution obsolete? – Adam Schroeder Jul 17 '17 at 04:02
@AdamSchroeder before the conversion, you can execute `df['Age'][pd.to_numeric(df['Age'], errors='coerce').isnull()]` and this will show you which cells are coerced to nan. – ayhan Jul 17 '17 at 05:42

How can .mean() exclude NaN values inside aggregate function?

1 Answers1