Count events before a specific time for a series of items in R

Question

I have a dataframe of items with a certain number of different events which occur at different times. e.g. say I had a times of events (goal, corner, red card etc...) in various games of football. I want to count the number of each events which occurred before a certain time for each team in each game (where the time is different for each game).

So I could have a dataframe of events (where C is corner, G is goal and R is red card) as follows:

events <- data.frame(
            game_id = c(1,   1,   1,   1,   1,   1,   2,   2,   2,   2,   2,   2,   2),
            team    = c(1,   1,   2,   1,   2,   2,   1,   1,   2,   2,   2,   1,   1),
            event_id= c('C', 'C', 'C', 'G', 'C', 'R', 'C', 'C', 'C', 'C', 'G', 'G', 'C'),
            time    = c(5,   14,   27,  67,  78,  87, 10,  19,  33,  45,  60,  78,  89))

and another dataframe of times to look up for each event as follows:

eventTime <- data.frame(
             game_id = c(1, 2),
             time    = c(45, 65))

So for game 1 I want to count the number of each event for each team before the 45th minute, and for game 2 I would want to do the same thing but for the 60th minute so return something like:

game_id time t1_C t1_G t1_R t2_C t2_G t2_R
    1    45   2    0     0   1    0     0
    2    65   2    0     0   2    1     0

Since in game 1 team1 had 2 corners, 0 goals and 0 red cards before the 45th minute whilst team 2 had 1 corner, 0 goals and 0 red cards.

I have been doing this by using apply to go through and subset the data I am after and counting up the rows, however I have 1000's of rows and this takes a lot of time.

Does anyone know of the quickest way of doing this?

EDIT: I failed to mention that any game_id may appear multiple times with different times in the eventTime dataframe. E.g. game_id could appear twice with times 45 and 70, I would want to get the appropriate counts for each unique event/time combination.

TheComeOnMan · Answer 1 · 2013-10-10T14:06:38.017

Run this step by step to understand, this should give you the intended result, I think. Also, there is room for reducing the code if you're up for it -

library(data.table)
library(reshape)
library(reshape2)

events <- data.table(events)
eventTime <- data.table(eventTime)
eventTime[,TimeLimit := time]

setkeyv(eventTime,c('game_id','time'))
setkeyv(events,c('game_id','time'))

eventsSubset <- eventTime[events, roll = -Inf][!is.na(TimeLimit)]

eventsSubset <- eventsSubset[,list(Freq = .N), by = c('team','event_id','game_id','TimeLimit')]

eventsReshaped <- cast(eventsSubset, game_id + TimeLimit ~ event_id+team, fun.aggregate = sum, value = "Freq")

Output

> eventsReshaped 
  game_id TimeLimit C_1 C_2 G_2
1       1        45   2   1   0
2       2        65   2   2   1

PS- This assumes that in your whole dataset, each type of event will occur at least once. The output of this specific code aggregates over only the events found which is why the result doesn't have all event-team combinations. You can add dummy entries to your original dataset, if you want to make sure this doesn't happen.

Thanks I think this does what I asked, however there was something else which I failed to mention in my initial question (my fault sorry, will edit question now). There are some game_ids which appear multiple times in the eventTime, e.g. I could have game_id twice, once with time = 45 and once with time = 70, and for each event/time combination I would want to get the counts. I don't think the above would work for that? — user1165199, Oct 10 '13 at 14:08
Nope. First thought - loop through each row of eventstime and keep rbinding the output to a master dataset. — TheComeOnMan, Oct 10 '13 at 14:24

score 1 · Answer 2 · answered Oct 10 '13 at 14:00

It will help to rename the event time to something different from "time":

names(eventTime)[2] <- "stopTime"

Merge stopping time into main data set:

events <- merge(events,eventTime)

Load useful packages:

library(reshape2)
library(plyr)

Subset to retain the events before the stopping time:

e2 <- subset(events,time<stopTime)

Create a table of game * team * event and 'melt' it to long form:

m2 <- melt(with(e2,table(game_id,team,event_id)))

Rearrange to your preferred wide form:

m3 <- dcast(m2,game_id~team+event_id)

Put the stopping time back into the results:

merge(eventTime,m3)

score 1 · Accepted Answer · answered Oct 10 '13 at 15:31

Thanks to both of you, I think both of your answers would have answered my initial question, but wouldn't quite work for the editted question. However I have combined parts of both of your answers to get something which works for me.

I used the first part of Ben Bolkers answer by merging the data frames and subsetting where time less than stopTime. Then converted to data table and used the last two lines of Coderemifa's answer. So somethign as follows

library(reshape)
library(reshape2)
library(plyr)
names(eventTime)[2] <- "stopTime"
events <- merge(events,eventTime)
e2 <- subset(events,time<stopTime)
eventsSubset <- data.table(e2)
eventsSubset <- eventsSubset[,list(Freq = .N), by=c('team','event_id','game_id','stopTime')]
eventsReshaped <- cast(eventsSubset, game_id + stopTime~ event_id+team, fun.aggregate = sum, value = "Freq")

Count events before a specific time for a series of items in R

3 Answers3

Linked