I'm wanting to create a rolling function that conditionally counts the occurance of two columns in previous rows.
As an example, I have a dataset that looks like the following.
# Generate data
set.seed(123)
test <- data.frame(
Round = rep(1:5, times = 3),
Team = rep(c("Team 1", "Team 2", "Team 3"), each = 5),
Venue = sample(sample(c("Venue A", "Venue B"), 15, replace = T))
)
Round Team Venue
1 1 Team 1 Venue B
2 2 Team 1 Venue B
3 3 Team 1 Venue A
4 4 Team 1 Venue A
5 5 Team 1 Venue B
6 1 Team 2 Venue B
7 2 Team 2 Venue B
8 3 Team 2 Venue A
9 4 Team 2 Venue A
10 5 Team 2 Venue A
11 1 Team 3 Venue B
12 2 Team 3 Venue A
13 3 Team 3 Venue B
14 4 Team 3 Venue B
15 5 Team 3 Venue B
I want a new column that shows for each row, the number of times the team in that row has played at the venue in that row in the last 3 rounds.
I can do this quite easily with a for loop.
window <- 3
for (i in 1:nrow(dat)){
# Create index to search (if i is less than window, start at 1)
index <- max(i - window, 1):i
# Search when current row matches both team and venue
dat$VenueCount[i] <- sum(dat$Team[i] == dat$Team[index] & dat$Venue[i] == dat$Venue[index])
}
Round Team Venue VenueCount
1 1 Team 1 Venue B 1
2 2 Team 1 Venue B 2
3 3 Team 1 Venue A 1
4 4 Team 1 Venue A 2
5 5 Team 1 Venue B 2
6 1 Team 2 Venue B 1
7 2 Team 2 Venue B 2
8 3 Team 2 Venue A 1
9 4 Team 2 Venue A 2
10 5 Team 2 Venue A 3
11 1 Team 3 Venue B 1
12 2 Team 3 Venue A 1
13 3 Team 3 Venue B 2
14 4 Team 3 Venue B 3
15 5 Team 3 Venue B 3
However, I'm wanting to avoid a for loop (mostly as my actual dataset is relatively large at around ~30k rows). I'm thinking it should be doable with one of zoo
, dplyr
, purrr
or apply
but haven't been able to work it out.
Thanks