Pandas count between two values in a column in a group by

Question

This has to be the easiest thing to do I just can't figure it out.

I have a dataframe like this:

    station ts                  EventType
0   BLOOR   2020-02-04 20:35:00 Valid Entry
1   BLOOR   2020-02-07 17:45:00 Valid Exit
2   COLLEGE 2020-02-05 14:20:00 Valid Exit
3   BAY     2020-02-04 16:55:00 Valid Entry
4   UNION   2020-02-07 08:20:00 Valid Exit

and I just want a dataframe that splits the "EventType" column and counts them grouped by station and ts:

    station ts                  EntryCount  ExitCount
0   BLOOR   2020-02-04 20:35:00 5           2
1   BLOOR   2020-02-04 20:40:00 10          6
2   COLLEGE 2020-02-04 20:35:00 3           7
3   COLLEGE 2020-02-04 20:40:00 6           8
4   COLLEGE 2020-02-04 20:45:00 5           9

I've started down these paths unsuccessfully:

df.pivot_table(index=['station','ts'], columns="EventType", values="EventType", aggfunc=sum)

or

df['EntryCount'] = df.groupby(["ts", "station"], as_index=False, sort=False)["EventType"].apply(
     lambda etype: etype == "Valid Entry"
 ).sum()

(worth mentioning- the dataframe is quite large as well)

score 1 · Accepted Answer · answered Jun 26 '20 at 22:08

1

How about:

df.groupby(['station','ts']).EventType.value_counts().unstack()

answered Jun 26 '20 at 22:08

Quang Hoang

146,074
10
56
74

score 0 · Answer 2 · answered Jun 26 '20 at 22:56

You just need to use size as aggfunc instead of sum

df_final = df.pivot_table(index=['station','ts'], 
                          columns="EventType", 
                          values="EventType", aggfunc='size')

Out[587]:
EventType                    Valid Entry  Valid Exit
station ts
BAY     2020-02-04 16:55:00          1.0         NaN
BLOOR   2020-02-04 20:35:00          1.0         NaN
        2020-02-07 17:45:00          NaN         1.0
COLLEGE 2020-02-05 14:20:00          NaN         1.0
UNION   2020-02-07 08:20:00          NaN         1.0

Pandas count between two values in a column in a group by

2 Answers2