4

I have a pandas dataframe that I groupby, and then perform an aggregate calculation to get the mean for:

grouped = df.groupby(['year_month', 'company'])
means = grouped.agg({'size':['mean']})

Which gives me a dataframe back, but I can't seem to filter it to the specific company and year_month that I want:

means[(means['year_month']=='201412')]

gives me a KeyError

skunkwerk
  • 2,920
  • 2
  • 37
  • 55
  • Provide a sample code, with data. I have an idea but this line is slightly confusing `means = grouped.agg({'size':['mean']})` – Leb Oct 29 '15 at 01:40

2 Answers2

2

The issue is that you are grouping based on 'year_month' and 'company' . Hence in the means DataFrame, year_month and company would be part of the index (MutliIndex). You cannot access them as you access other columns.

One method to do this would be to get the values of the level 'year_month' of index . Example -

means.loc[means.index.get_level_values('year_month') == '201412']

Demo -

In [38]: df
Out[38]:
   A  B   C
0  1  2  10
1  3  4  11
2  5  6  12
3  1  7  13
4  2  8  14
5  1  9  15

In [39]: means = df.groupby(['A','B']).mean()

In [40]: means
Out[40]:
      C
A B
1 2  10
  7  13
  9  15
2 8  14
3 4  11
5 6  12

In [41]: means.loc[means.index.get_level_values('A') == 1]
Out[41]:
      C
A B
1 2  10
  7  13
  9  15
Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • Another method would be to use `.reset_index()` , if you do not want those columns as index. Then you can do what you were trying – Anand S Kumar Oct 29 '15 at 01:46
1

As already pointed out, you will end up with a 2 level index. You could try to unstack the aggregated dataframe:

means = df.groupby(['year_month', 'company']).agg({'size':['mean']}).unstack(level=1)

This should give you a single 'year_month' index, 'company' as columns and your aggregate size as values. You can then slice by the index:

means.loc['201412']
Turanga1
  • 123
  • 1
  • 7