0

This is similar to MultiIndex-based indexing in pandas.

Is there a better way to iterate over sub-series?

df = pd.DataFrame([[1,1,1], [1,2,1], [1,2,2],
                   [2,1,1], [2,2,1], [2,3,1], [2,3,2], [2,3,3]],
                  columns=['a', 'b', 'c'])
g = df.groupby(['a', 'b']).size()
for label in g.index.levels[0]:
    print(label)
    print(g[label])

This will give:

1
b
1    1
2    2
dtype: int64
2
b
1    1
2    1
3    3
dtype: int64

Something like this pseudo-code:

for label, series in g.get_sub_series(level = 0):
    print(label)
    print(series)
Community
  • 1
  • 1
damisan
  • 1,037
  • 6
  • 17
  • 1
    Maybe `for label, series in g.groupby(level = 0)`? – Psidom Feb 01 '17 at 14:43
  • Should I use groupby on Series that is the result of another groupby + fast count? Won't this re-compute the groups? The DataFrame may be ~300MiB. – damisan Feb 01 '17 at 15:36
  • 300MB shouldn't be too large for pandas, and also may be you can try groupby('b') firstly and then for each sub group, groupby('a'). I am not sure what you are trying to do, this is what I can suggest. – Psidom Feb 01 '17 at 15:48
  • @Psidom: Yes, 300 MiB is fine. My question was about doing the same work again (grouping) which would take some time given the volume of data. And yes, I can group by 'b' first. That's not the issue. The code in my original post does the job. I want to know specific how how to iterate over level 0 in the MultiIndexed Series without re-computing anything and without n hash look-ups (if it is possible). That way I learn something new about pandas :) – damisan Feb 01 '17 at 19:55

0 Answers0