0

So I am trying to learn R at the moment and people always say the best way to learn R is to use it on personal projects that you are passionate about. So I am trying my hand at football analytics.

I have created a tibble that has 4 variables, name, game_id, epa, n_dropbacks.

For this question, all you need to be concerned about is name.

In the name column, I have a bunch of repeating names, which is fine and expected but now I want to remove all names that repeat less than 30 times.

Code that created the tibble and what it now looks like:

all_qbs <- rp_pbp_12_20 %>%
  filter(!is.na(epa), !is.na(name)) %>%
  group_by(name, game_id) %>%
  summarize(
    epa = mean(qb_epa),
    n_dropbacks = sum(pass)) %>%
  filter(n_dropbacks >= 15) %>%
  ungroup()

 head(all_qbs)
# A tibble: 6 x 4
  name     game_id            epa n_dropbacks
  <chr>    <chr>            <dbl>       <dbl>
1 A.Dalton 2012_01_CIN_BAL -0.379          42
2 A.Dalton 2012_02_CLE_CIN  0.527          40
3 A.Dalton 2012_03_CIN_WAS  0.455          33
4 A.Dalton 2012_04_CIN_JAX  0.528          34
5 A.Dalton 2012_05_MIA_CIN -0.144          52
6 A.Dalton 2012_06_CIN_CLE -0.199          51

So I am wanting to remove all quarterbacks that have less than 30 games in their career where they dropped back to pass 15 or more times.

1 Answers1

0

Something like:

cc <- all_qbs %>% count(name) %>% filter(n>30)
most_qbs <- right_join(all_qbs, cc, by="name")

(or for the second step filter(all_qbs, name %in% cc$name))

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • there's probably a near-duplicate somewhere but I don't know if I can find it easily ... – Ben Bolker Oct 10 '20 at 00:17
  • Hey you nailed it thank you so much, I have been trying to figure it out for hours, this is my first day of doing legitimate work in R and I was getting very frustrated, thank you. – evan_patton Oct 10 '20 at 00:32