9

Is there a way to slice a DataFrameGroupBy object?

For example, if I have:

df = pd.DataFrame({'A': [2, 1, 1, 3, 3], 'B': ['x', 'y', 'z', 'r', 'p']})

   A  B
0  2  x
1  1  y
2  1  z
3  3  r
4  3  p

dfg = df.groupby('A')

Now, the returned GroupBy object is indexed by values from A, and I would like to select a subset of it, e.g. to perform aggregation. It could be something like

dfg.loc[1:2].agg(...)

or, for a specific column,

dfg['B'].loc[1:2].agg(...)

EDIT. To make it more clear: by slicing the GroupBy object I mean accessing only a subset of groups. In the above example, the GroupBy object will contain 3 groups, for A = 1, A = 2, and A = 3. For some reasons, I may only be interested in groups for A = 1 and A = 2.

Zaus
  • 1,089
  • 15
  • 25
Konstantin
  • 2,451
  • 1
  • 24
  • 26
  • What is the intended output, say for example `sum`? – Zero Sep 15 '17 at 10:03
  • Possible duplicate https://stackoverflow.com/questions/43305214/creating-slices-of-dataframe-groupby-groups – zimmerrol Sep 15 '17 at 10:04
  • 2
    nth does exactly this: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.nth.html?highlight=nth#pandas.core.groupby.GroupBy.nth – Jeff Sep 15 '17 at 10:46

2 Answers2

3

It seesm you need custom function with iloc - but if use agg is necessary return aggregate value:

df = df.groupby('A')['B'].agg(lambda x: ','.join(x.iloc[0:3]))
print (df)
A
1    y,z
2      x
3    r,p
Name: B, dtype: object

df = df.groupby('A')['B'].agg(lambda x: ','.join(x.iloc[1:3]))
print (df)
A
1    z
2     
3    p
Name: B, dtype: object

For multiple columns:

df = pd.DataFrame({'A': [2, 1, 1, 3, 3], 
                   'B': ['x', 'y', 'z', 'r', 'p'], 
                   'C': ['g', 'y', 'y', 'u', 'k']})
print (df)
   A  B  C
0  2  x  g
1  1  y  y
2  1  z  y
3  3  r  u
4  3  p  k

df = df.groupby('A').agg(lambda x: ','.join(x.iloc[1:3]))
print (df)
   B  C
A      
1  z  y
2      
3  p  k
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

If I understand correctly, you only want some groups, but those are supposed to be returned completely:

    A   B
1   1   y
2   1   z
0   2   x

You can solve your problem by extracting the keys and then selecting groups based on those keys.

Assuming you already know the groups:

pd.concat([dfg.get_group(1),dfg.get_group(2)])

If you don't know the group names and are just looking for random n groups, this might work:

pd.concat([dfg.get_group(n) for n in list(dict(list(dfg)).keys())[:2]])

The output in both cases is a normal DataFrame, not a DataFrameGroupBy object, so it might be smarter to first filter your DataFrame and only aggregate afterwards:

df[df['A'].isin([1,2])].groupby('A')

The same for unknown groups:

df[df['A'].isin(list(set(df['A']))[:2])].groupby('A')

I believe there are some Stackoverflow answers refering to this, like How to access pandas groupby dataframe by key

Anne
  • 583
  • 5
  • 15