I have a pandas DataFrame that I group by several columns, some of which contain null values:
>>> gp = df.groupby(columns)
I expect there to be ~1000 distinct groups, and this is what I get using len()
:
>>> len(gp)
1000
However, when I apply an aggregate function, I only get ~50 rows back!
>>> gp.mean().shape[0]
50
Any ideas? Is this because the columns which I use for aggregating contain null values? Is there a way to force pandas to treat null values as any other value, and produce the output that you would get using SQL GROUP BY
and AVG?