Sample Input:
GAME_ID TIME PER EVENT
0 0022000394 12:00 1 12
1 0022000394 12:00 1 10
2 0022000394 11:36 1 1
3 0022000394 11:24 1 1
4 0022000394 11:04 1 1
5 0022000394 10:41 1 1
6 0022000394 10:30 1 2
7 0022000394 10:29 1 4
8 0022000394 10:17 1 1
9 0022000394 10:01 1 1
10 0022000394 9:48 1 2
11 0022000394 9:46 1 4
12 0022000394 9:42 1 6
13 0022000394 9:42 1 3
14 0022000394 9:42 1 3
15 0022000394 9:25 1 1
16 0022000394 9:15 1 1
17 0022000394 9:15 1 6
18 0022000394 9:15 1 3
19 0022000394 8:53 1 1
20 0022000394 8:33 1 1
21 0022000394 8:22 1 1
22 0022000394 8:16 1 2
23 0022000394 8:16 1 4
24 0022000394 8:12 1 2
I have a dataframe where I get a group of rows using groupby
.
If that group contains 3 rows where the column EVENTMSGTYPE
contains all of [1, 6, & 3]
I want to update the row in the original dataframe where EVENTMSGTYPE == 1
Current Working Solution (Slow)
# Group by
for _, data in df.groupby(['GAME_ID', 'TIME', 'PER']):
# If EVENT in group contains 1, 6, and 3 then update original df
if all(x in list(data.EVENT) for x in [1, 6, 3]):
# Update original df row where EVENT equals 1, should only have one value
index = data[data.EVENT == 1].index.values[0]
# Set UPDATED to True
df.at[index, 'UPDATED'] = True
Expected Output:
GAME_ID TIME PER EVENT UPDATED
...
16 0022000394 9:15 1 1 True
...
My dataframe has 1,694,389 rows and this takes ~53 seconds to run on my machine, can the performance of this be improved?