What does indexing into the result of a pandas groupby do?

Question

Let's say I have this dataframe,

df = pd.DataFrame([['a', 'b', 'c'], 
                   ['1', '2', '3'], 
                   ['4', '5', '6']],
                  index=['A', 'B', 'C'], 
                  columns=['x', 'y', 'z'])

    x   y   z
A   a   b   c
B   1   2   3
C   4   5   6

I saw the code, df.groupby('x')['y']. In here, what does ['y'] do? I understand ('x').
Thanks in advance!

would you like too look [this answer](https://stackoverflow.com/a/53781645/8333806). Merci — abdoulsn, Dec 15 '19 at 03:54
@NicolasGervais It returns `pandas.core.groupby.generic.SeriesGroupBy object`. — jayko03, Dec 15 '19 at 03:56
here,`('x')` is used for DataFrameGroupBy whereas `['y']` is used for SeriesGroupBy in pandas — Joy, Dec 15 '19 at 03:57
`df.groupby('x')` groups on col `x` while `df.groupby('x')['y']` <- this would make a function operate on col `y` after grouping on `x` , eg `df.groupby('x')['y'].sum()` would give sum on `y` after grouping on `x` however `df.groupby('x').sum()` would return sum of all columns (not only y) after grouping on x. — anky, Dec 15 '19 at 04:32

Nicolas Gervais · Accepted Answer · 2019-12-15T11:21:13.527

2

The new index is the new group you made with groupby(). The ['y'] will return the column y. But, you also need to call a function on your aggregated rows, like sum(). Here's an example:

import pandas as pd

df = pd.DataFrame({'Name':['Mark', 'Laura', 'Adam', 'Roger', 'Anna'],
                   'City':['Lisbon', 'Montreal', 'Lisbon', 'Berlin', 'Glasgow'],
                   'Height':[173.4, 151.8, 179.3, 169.1, 166.4]})
print(df)

    Name      City  Height
0   Mark    Lisbon   173.4
1  Laura  Montreal   151.8
2   Adam    Lisbon   179.3
3  Roger    Berlin   169.1
4   Anna   Glasgow   166.4

Return the sum of the people, grouped by the City:

df.groupby('City').sum()['Height']

Out[46]: 
City
Berlin      169.1
Glasgow     166.4
Lisbon      352.7
Montreal    151.8
Name: Height, dtype: float64

The new index is the group, and you selected one column to display. You can either put it before or after sum().

edited Dec 15 '19 at 11:21

answered Dec 15 '19 at 03:58

Nicolas Gervais

33,817
13
115
143

If you group values, you need to tell pandas how do you want them grouped. By mean? By sum? Because that's what aggregation is – Nicolas Gervais Dec 15 '19 at 04:09
@MadPhysicist Have you read the Pandas docs? It sounds like you need a tutorial or guide, Stack Overflow isn’t really meant for this. Also, I think the code you shared is already doing that. – AMC Dec 15 '19 at 04:16
I'm not OP. Just a heckler from the sidelines. I'll take your advice regardless. – Mad Physicist Dec 15 '19 at 04:19
1

OK. I've looked through the docs. I see nothing that indicates that you have to aggregate after grouping. – Mad Physicist Dec 15 '19 at 04:22
From the docs: `DataFrameGroupBy` Returns: Depends on the __calling object__ and returns groupby object that contains information about the groups. – Nicolas Gervais Dec 15 '19 at 04:25
@MadPhysicist Oops, sorry for assuming your were OP! Indeed, I also assumed you could just get the column directly, no need to call anything. Unfortunately I’m not at my computer right now so I can’t check. – AMC Dec 15 '19 at 04:28
@NicolasGervais Would managing to print the output of a groupby like the one in the OP (without aggregation or a function) constitute enough proof? – AMC Dec 15 '19 at 04:41
`print(list(df.groupby('x')['y']))` where `df` is the one from the OP. (I see you’re in Montreal too, hi!) – AMC Dec 15 '19 at 04:44
@AlexanderCécile What is OP? – jayko03 Dec 15 '19 at 16:50
@jayko03 OP = original poster – AMC Dec 15 '19 at 18:54

score 0 · Answer 2 · answered Jan 03 '20 at 14:01

0

groupby()

created a group of df which allotted the same x values to the given rows. Then, for each of these groups, you grabbe the y column and counted how many times it appeared. It's like value_counts() (a shortcut to this groupby() operation).

answered Jan 03 '20 at 14:01

abdoulsn

842
2
16
32

What does indexing into the result of a pandas groupby do?

2 Answers2