0

I have a data frame with values such as:

name    action           time
------------------------------
Sam     enters building  20:00
Chris   enters building  20:15
Sam     walks up stairs  20:20
Steve   leaves building  20:25
James   enters building  20:30
Chris   takes elevator   20:32
Sam     leaves building  20:35
Chris   leaves building  20:40
Sam     enters building  20:45
...

I want to get the counts of distinct actions, per person. This I am able to accomplish with df.groupby(['name','action'], as_index=False).size()

However, this gives me a visual, which I can't do much with. I would like to use these counts for different purposes. How can I put this data into an accessible data frame, so that I can get things like the amount of times each person entered and left the building. For example, I could call like df['name' == 'Sam' & 'action'=='enters building] and assign that count to a variable?

AMC
  • 2,642
  • 7
  • 13
  • 35
Sam Cohen-Devries
  • 2,025
  • 2
  • 18
  • 35

4 Answers4

1

To make it a dataframe, you can use .reset_index() and then set it to df.

df=df.groupby(['name','action'], as_index=False).size().reset_index()
David Erickson
  • 16,433
  • 2
  • 19
  • 35
1

Just use count instead - size is not aggregating function in pandas that's why your command as_index=False doesn't do anything.

df.groupby(['name','action'], as_index=False).count()

Outputs:

    name           action  time
0  Chris  enters building     1
1  Chris  leaves building     1
2  Chris   takes elevator     1
3  James  enters building     1
4    Sam  enters building     2
5    Sam  leaves building     1
6    Sam  walks up stairs     1
7  Steve  leaves building     1
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
1

You can also use your initial idea. In this case you obtain a Pandas series. You can directly index the values in which you are interested.

table = df.groupby(['name','action'], as_index=False).size()
table['Sam']['enters building']
0

This is a different solution where you add the size so you can still work with the original dataframe:

df['size'] = df.groupby(['name', 'action']).transform(np.size)
df.iloc[[0,2,-1], :]


#   name  action            time    size
# 0 Sam   enters building   20:00   2
# 2 Sam   walks up stairs   20:20   1
# 8 Sam   enters building   20:45   2
jcaliz
  • 3,891
  • 2
  • 9
  • 13