Access Aggregated Values in Group By Data Frame

Question

I have a data frame with values such as:

name    action           time
------------------------------
Sam     enters building  20:00
Chris   enters building  20:15
Sam     walks up stairs  20:20
Steve   leaves building  20:25
James   enters building  20:30
Chris   takes elevator   20:32
Sam     leaves building  20:35
Chris   leaves building  20:40
Sam     enters building  20:45
...

I want to get the counts of distinct actions, per person. This I am able to accomplish with df.groupby(['name','action'], as_index=False).size()

However, this gives me a visual, which I can't do much with. I would like to use these counts for different purposes. How can I put this data into an accessible data frame, so that I can get things like the amount of times each person entered and left the building. For example, I could call like df['name' == 'Sam' & 'action'=='enters building] and assign that count to a variable?

Please provide a [mcve]. Have you tried anything, done any research? — AMC, Apr 23 '20 at 23:43

score 1 · Answer 1 · answered Apr 23 '20 at 22:20

1

To make it a dataframe, you can use .reset_index() and then set it to df.

df=df.groupby(['name','action'], as_index=False).size().reset_index()

answered Apr 23 '20 at 22:20

David Erickson

16,433
2
19
35

score 1 · Answer 2 · answered Apr 23 '20 at 22:24

Just use count instead - size is not aggregating function in pandas that's why your command as_index=False doesn't do anything.

df.groupby(['name','action'], as_index=False).count()

Outputs:

    name           action  time
0  Chris  enters building     1
1  Chris  leaves building     1
2  Chris   takes elevator     1
3  James  enters building     1
4    Sam  enters building     2
5    Sam  leaves building     1
6    Sam  walks up stairs     1
7  Steve  leaves building     1

score 1 · Answer 3 · answered Apr 23 '20 at 22:36

1

You can also use your initial idea. In this case you obtain a Pandas series. You can directly index the values in which you are interested.

table = df.groupby(['name','action'], as_index=False).size()
table['Sam']['enters building']

answered Apr 23 '20 at 22:36

Moriah Mulah

11
2

score 0 · Answer 4 · answered Apr 23 '20 at 22:25

This is a different solution where you add the size so you can still work with the original dataframe:

df['size'] = df.groupby(['name', 'action']).transform(np.size)
df.iloc[[0,2,-1], :]


#   name  action            time    size
# 0 Sam   enters building   20:00   2
# 2 Sam   walks up stairs   20:20   1
# 8 Sam   enters building   20:45   2

Access Aggregated Values in Group By Data Frame

4 Answers4