12

I've been trying to figure out how I can return just the first group, after I apply groupby.

My code looks like this:

gb = df.groupby(['col1', 'col2', 'col3', 'col4'])['col5'].sum()

What I want is for that first first group to output. I've been trying the get_group method but it keeps failing (maybe because I am grouping by multiple columns?)

Here is an example of my output:

col1  col2  col3   col4  'sum'
 1     34   green   10    0.0
            yellow  30    1.5 
            orange  20    1.1 
 2     89   green   10    3.0 
            yellow   5    0.0 
            orange  10    1.0

What I want to be returned is just this:

col1  col2  col3   col4  'sum'
 1     34   green   10    0.0
            yellow  30    1.5 
            orange  20    1.1 

(Note the 'sum' column I just added here to make it clear what that last column was, but pandas does not actually name that column)

smci
  • 32,567
  • 20
  • 113
  • 146
Hana
  • 1,330
  • 4
  • 23
  • 38
  • Can you show your dataframe and desired output? – user3483203 Apr 12 '18 at 14:54
  • @chrisz I edited it! Hope that explains what I'm trying to convey. – Hana Apr 12 '18 at 15:09
  • Does this answer your question? [Pandas: how to get a particular group after groupby?](https://stackoverflow.com/questions/22702486/pandas-how-to-get-a-particular-group-after-groupby) – feetwet Feb 17 '21 at 18:15

4 Answers4

18

You can using get_group with groups

g=df.groupby(['col1','col2'])

g.get_group((list(g.groups)[0])).groupby(['col3','col4'])['col5'].sum()
BENY
  • 317,841
  • 20
  • 164
  • 234
5
gb = df.groupby(['col1', 'col2', 'col3', 'col4'])['col5'].sum()

gb.loc[[gb.index.levels[0][0]]])
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • It works, but it's clunky. Do you think `gb.get_group(...)` should be enhanced to accept integers? – smci Feb 11 '20 at 00:22
4

I believe you need:

idx = df.index.get_level_values(0)
df = df[idx == idx[0]] 

Or DataFrame.xs:

df = df.xs(df.index.levels[0][0])

print (df)
                       'sum'
col1 col2 col3   col4       
1    34   green  10      0.0
          yellow 30      1.5
          orange 20      1.1
user343233
  • 99
  • 6
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
3
for group_id, group_df in df.groupby(['col1', 'col2', 'col3', 'col4']):
    break

iterate over your groupby object and stop after the first iteration. The variables group_id and group_df will contain your first group.

Kind of an ugly workaround but works.

user2505961
  • 148
  • 10