I have a dataset that I'm grouping, and then trying to remove any groups that have no data in a particular column. For example:
df = pd.DataFrame{'movie': ['thg', 'thg', 'mol', 'mol', 'lob', 'lob'],
'rating': [3., 4., 5., np.nan, np.nan, np.nan],
'name': ['John', np.nan, 'Terry', 'Graham', 'Eric', np.nan]}
g = df.groupby('movie')
movie name rating
0 thg John 3
1 thg NaN 4
2 mol Terry 5
3 mol Graham NaN
4 lob Eric NaN
5 lob NaN NaN
I would like to remove the group lob
from the dataset, as nobody has rated it. I've tried
mask = g['rating'].mean().isnull()
g.filter(~mask)
which gives me an error of TypeError: 'Series' object is not callable
. That's kind of hackish, so I've also tried
g.filter(lambda group: group.isnull().all())
which seems more Pythonic, but it gives me an error of ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
. How can I filter out a group, and why do I get these errors? Any additional information about groupby
in general would also be helpful. I'm using pandas 0.12.0, Python 2.7.5, and Mac OS X 10.8.5.