How to get the first group in a groupby of multiple columns?

Question

I've been trying to figure out how I can return just the first group, after I apply groupby.

My code looks like this:

gb = df.groupby(['col1', 'col2', 'col3', 'col4'])['col5'].sum()

What I want is for that first first group to output. I've been trying the get_group method but it keeps failing (maybe because I am grouping by multiple columns?)

Here is an example of my output:

col1  col2  col3   col4  'sum'
 1     34   green   10    0.0
            yellow  30    1.5 
            orange  20    1.1 
 2     89   green   10    3.0 
            yellow   5    0.0 
            orange  10    1.0

What I want to be returned is just this:

col1  col2  col3   col4  'sum'
 1     34   green   10    0.0
            yellow  30    1.5 
            orange  20    1.1

(Note the 'sum' column I just added here to make it clear what that last column was, but pandas does not actually name that column)

@chrisz I edited it! Hope that explains what I'm trying to convey. — Hana, Apr 12 '18 at 15:09
Does this answer your question? [Pandas: how to get a particular group after groupby?](https://stackoverflow.com/questions/22702486/pandas-how-to-get-a-particular-group-after-groupby) — feetwet, Feb 17 '21 at 18:15

BENY · Accepted Answer · 2018-04-12T15:14:12.567

18

You can using get_group with groups

g=df.groupby(['col1','col2'])

g.get_group((list(g.groups)[0])).groupby(['col3','col4'])['col5'].sum()

edited Apr 12 '18 at 15:14

answered Apr 12 '18 at 15:08

BENY

317,841
20
164
234

`list(g.groups)[0]` seems suboptimal it would be better to use `next(g.groups)` – Vulwsztyn Nov 09 '22 at 22:33

score 5 · Answer 2 · answered Apr 12 '18 at 15:12

5

gb = df.groupby(['col1', 'col2', 'col3', 'col4'])['col5'].sum()

gb.loc[[gb.index.levels[0][0]]])

answered Apr 12 '18 at 15:12

piRSquared

285,575
57
475
624

It works, but it's clunky. Do you think `gb.get_group(...)` should be enhanced to accept integers? – smci Feb 11 '20 at 00:22

score 4 · Answer 3 · edited Sep 01 '23 at 02:25

4

I believe you need:

idx = df.index.get_level_values(0)
df = df[idx == idx[0]]

Or DataFrame.xs:

df = df.xs(df.index.levels[0][0])

print (df)
                       'sum'
col1 col2 col3   col4       
1    34   green  10      0.0
          yellow 30      1.5
          orange 20      1.1

edited Sep 01 '23 at 02:25

user343233

99
6

answered Apr 12 '18 at 14:56

jezrael

822,522
95
1,334
1,252

score 3 · Answer 4 · answered Aug 11 '21 at 08:22

for group_id, group_df in df.groupby(['col1', 'col2', 'col3', 'col4']):
    break

iterate over your groupby object and stop after the first iteration. The variables group_id and group_df will contain your first group.

Kind of an ugly workaround but works.

How to get the first group in a groupby of multiple columns?

4 Answers4

Linked