0

This has to be the easiest thing to do I just can't figure it out.

I have a dataframe like this:

    station ts                  EventType
0   BLOOR   2020-02-04 20:35:00 Valid Entry
1   BLOOR   2020-02-07 17:45:00 Valid Exit
2   COLLEGE 2020-02-05 14:20:00 Valid Exit
3   BAY     2020-02-04 16:55:00 Valid Entry
4   UNION   2020-02-07 08:20:00 Valid Exit

and I just want a dataframe that splits the "EventType" column and counts them grouped by station and ts:

    station ts                  EntryCount  ExitCount
0   BLOOR   2020-02-04 20:35:00 5           2
1   BLOOR   2020-02-04 20:40:00 10          6
2   COLLEGE 2020-02-04 20:35:00 3           7
3   COLLEGE 2020-02-04 20:40:00 6           8
4   COLLEGE 2020-02-04 20:45:00 5           9

I've started down these paths unsuccessfully:

df.pivot_table(index=['station','ts'], columns="EventType", values="EventType", aggfunc=sum)

or

df['EntryCount'] = df.groupby(["ts", "station"], as_index=False, sort=False)["EventType"].apply(
     lambda etype: etype == "Valid Entry"
 ).sum()

(worth mentioning- the dataframe is quite large as well)

ggorlen
  • 44,755
  • 7
  • 76
  • 106
Alexander Witte
  • 195
  • 1
  • 11

2 Answers2

1

How about:

df.groupby(['station','ts']).EventType.value_counts().unstack()
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

You just need to use size as aggfunc instead of sum

df_final = df.pivot_table(index=['station','ts'], 
                          columns="EventType", 
                          values="EventType", aggfunc='size')

Out[587]:
EventType                    Valid Entry  Valid Exit
station ts
BAY     2020-02-04 16:55:00          1.0         NaN
BLOOR   2020-02-04 20:35:00          1.0         NaN
        2020-02-07 17:45:00          NaN         1.0
COLLEGE 2020-02-05 14:20:00          NaN         1.0
UNION   2020-02-07 08:20:00          NaN         1.0
Andy L.
  • 24,909
  • 4
  • 17
  • 29