So I am trying to learn R at the moment and people always say the best way to learn R is to use it on personal projects that you are passionate about. So I am trying my hand at football analytics.
I have created a tibble that has 4 variables, name, game_id, epa, n_dropbacks.
For this question, all you need to be concerned about is name.
In the name column, I have a bunch of repeating names, which is fine and expected but now I want to remove all names that repeat less than 30 times.
Code that created the tibble and what it now looks like:
all_qbs <- rp_pbp_12_20 %>%
filter(!is.na(epa), !is.na(name)) %>%
group_by(name, game_id) %>%
summarize(
epa = mean(qb_epa),
n_dropbacks = sum(pass)) %>%
filter(n_dropbacks >= 15) %>%
ungroup()
head(all_qbs)
# A tibble: 6 x 4
name game_id epa n_dropbacks
<chr> <chr> <dbl> <dbl>
1 A.Dalton 2012_01_CIN_BAL -0.379 42
2 A.Dalton 2012_02_CLE_CIN 0.527 40
3 A.Dalton 2012_03_CIN_WAS 0.455 33
4 A.Dalton 2012_04_CIN_JAX 0.528 34
5 A.Dalton 2012_05_MIA_CIN -0.144 52
6 A.Dalton 2012_06_CIN_CLE -0.199 51
So I am wanting to remove all quarterbacks that have less than 30 games in their career where they dropped back to pass 15 or more times.