0

I have a pandas DataFrame that I group by several columns, some of which contain null values:

>>> gp = df.groupby(columns)

I expect there to be ~1000 distinct groups, and this is what I get using len():

>>> len(gp)
1000

However, when I apply an aggregate function, I only get ~50 rows back!

>>> gp.mean().shape[0]
50

Any ideas? Is this because the columns which I use for aggregating contain null values? Is there a way to force pandas to treat null values as any other value, and produce the output that you would get using SQL GROUP BY and AVG?

ostrokach
  • 17,993
  • 11
  • 78
  • 90
  • 1
    I believe this is a duplicate of this [post](http://stackoverflow.com/a/18431417/3453737). In short, use `fillna` to replace the value and then do the groupby. – shawnheide Oct 14 '16 at 03:24
  • @shawnheide Yup, thanks! I had even upvoted that post before. However, this behavious is **very** unintuitive for me. I would much rather call `df = df.dropna(subset=columns)` when I want to exclude nulls than call `df.loc[:, columns] = df[columns].fillna('')` before groupby and `df.loc[:, columns] = df[columns].replace('', np.nan)` after, if I want to include them.... – ostrokach Oct 14 '16 at 03:30

0 Answers0