32

I've tried to split my dataframe to groups

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                       'foo', 'bar', 'foo', 'foo'],
                   'B' : ['1', '2', '3', '4',
                       '5', '6', '7', '8'],
                   })

grouped = df.groupby('A')

I get 2 groups

     A  B
0  foo  1
2  foo  3
4  foo  5
6  foo  7
7  foo  8

     A  B
1  bar  2
3  bar  4
5  bar  6

Now I want to reset indexes for each group separately

print grouped.get_group('foo').reset_index()
print grouped.get_group('bar').reset_index()

Finally I get the result

     A  B
0  foo  1
1  foo  3
2  foo  5
3  foo  7
4  foo  8

     A  B
0  bar  2
1  bar  4
2  bar  6

Is there better way how to do this? (For example: automatically call some method for each group)

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Meloun
  • 13,601
  • 17
  • 64
  • 93
  • `grouped = df.reset_index().groupby('A')` ? – behzad.nouri Mar 14 '14 at 14:32
  • I don think so, I want to have reseted indexes for each group.. (post updated) – Meloun Mar 14 '14 at 14:47
  • Do you really need to index reset on each group (can't it be the sub index of the original dataframe)? if not, why not. – Andy Hayden Mar 14 '14 at 16:25
  • 1
    Adding to @AndyHayden, would you simply like to slice your group rows by integer position? If so, you could use `.iloc`. For instance, `grouped.get_group('foo').iloc[0:3]` would return the first three rows of 'foo' while maintaining the original indexing. – Greg Mar 14 '14 at 16:57

5 Answers5

40

Pass in as_index=False to the groupby, then you don't need to reset_index to make the groupby-d columns columns again:

In [11]: grouped = df.groupby('A', as_index=False)

In [12]: grouped.get_group('foo')
Out[12]:
     A  B
0  foo  1
2  foo  3
4  foo  5
6  foo  7
7  foo  8

Note: As pointed out (and seen in the above example) the index above is not [0, 1, 2, ...], I claim that this will never matter in practice - if it does you're going to have to just through some strange hoops - it's going to be more verbose, less readable and less efficient...

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • `as_index` doesn't do anything for `get_group`; try `df.groupby('A', as_index=True).get_group('foo').index`; it returns the original data-frame index ( at least on `0.13.1` ) – behzad.nouri Mar 14 '14 at 16:08
  • I initially thought something like this would work too, but the output indexing is different than what he is looking for. – Greg Mar 14 '14 at 16:20
  • @Greg That's a good point, however it seems unlikely that this will matter.. presumably what matters is that the grouped by columns are in columns again. – Andy Hayden Mar 14 '14 at 16:24
  • @behzad.nouri can't think of a time when this would ever be a problem / there would ever be a reason to care about the distinction. – Andy Hayden Mar 14 '14 at 16:26
  • @behzad.nouri which is to say, **it does do something** it ensures the groupedby columns are not in the index but are columns. – Andy Hayden Mar 14 '14 at 16:38
16
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                       'foo', 'bar', 'foo', 'foo'],
                   'B' : ['1', '2', '3', '4',
                       '5', '6', '7', '8'],
                   })
grouped = df.groupby('A',as_index = False)

we get two groups

grouped_index = grouped.apply(lambda x: x.reset_index(drop = True)).reset_index()

Result in two new columns level_0 and level_1 getting added and the index is reset


 level_0level_1 A   B
0   0     0    bar  2
1   0     1    bar  4
2   0     2    bar  6
3   1     0    foo  1
4   1     1    foo  3
5   1     2    foo  5
6   1     3    foo  7
7   1     4    foo  8
result = grouped_index.drop('level_0',axis = 1).set_index('level_1')

Creates an index within each group of "A"

          A     B
level_1     
0        bar    2
1        bar    4
2        bar    6
0        foo    1
1        foo    3
2        foo    5
3        foo    7
4        foo    8
4
df=df.groupby('A').apply(lambda x: x.reset_index(drop=True)).drop('A',axis=1).reset_index()
Zoe
  • 27,060
  • 21
  • 118
  • 148
Songhua Hu
  • 191
  • 2
  • 4
1

Something like this would work:

for group, index in grouped.indices.iteritems():
    grouped.indices[group] = range(0, len(index))

You could probably make it less verbose if you wanted to.

Greg
  • 6,791
  • 3
  • 18
  • 20
  • 1
    I would be **wary** of modifying indices like this, it's used behind the scenes in other groupby methods so potentially this will break stuff if you're reusing the groupby. (Kinda clever though..) – Andy Hayden Mar 14 '14 at 16:37
-3

Isn't this just grouped = grouped.apply(lambda x: x.reset_index()) ?

BAC83
  • 811
  • 1
  • 12
  • 27