I have a dataframe of items with a certain number of different events which occur at different times. e.g. say I had a times of events (goal, corner, red card etc...) in various games of football. I want to count the number of each events which occurred before a certain time for each team in each game (where the time is different for each game).
So I could have a dataframe of events (where C is corner, G is goal and R is red card) as follows:
events <- data.frame(
game_id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2),
team = c(1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 1, 1),
event_id= c('C', 'C', 'C', 'G', 'C', 'R', 'C', 'C', 'C', 'C', 'G', 'G', 'C'),
time = c(5, 14, 27, 67, 78, 87, 10, 19, 33, 45, 60, 78, 89))
and another dataframe of times to look up for each event as follows:
eventTime <- data.frame(
game_id = c(1, 2),
time = c(45, 65))
So for game 1 I want to count the number of each event for each team before the 45th minute, and for game 2 I would want to do the same thing but for the 60th minute so return something like:
game_id time t1_C t1_G t1_R t2_C t2_G t2_R
1 45 2 0 0 1 0 0
2 65 2 0 0 2 1 0
Since in game 1 team1 had 2 corners, 0 goals and 0 red cards before the 45th minute whilst team 2 had 1 corner, 0 goals and 0 red cards.
I have been doing this by using apply to go through and subset the data I am after and counting up the rows, however I have 1000's of rows and this takes a lot of time.
Does anyone know of the quickest way of doing this?
EDIT: I failed to mention that any game_id may appear multiple times with different times in the eventTime dataframe. E.g. game_id could appear twice with times 45 and 70, I would want to get the appropriate counts for each unique event/time combination.