0
    GameID     Drive        down    yrdln        PlayType      sp
2009091000      1           1        22           run          0
2009091000      1           2        25           pass         0
2009091000      1           3        32           sack         0
2009091000      2           1        42           pass         0
2009091000      2           2        44           run          0
2009091000      2           3        43           pass         0
2009091000      2           4        33           Field Goal   1
2009091001      1           1        5            pass         0
2009091001      1           2        10           pass         1

The above data is similar to a data frame I am currently working with. I am attempting to determine the "yrdln" for the first entry for each "drive" and the "yrdln" for the last entry for each "drive". But each GameID has many drive 1, drive 2, and so on.

I have been trying to Group_by GameID, then Drive, and then somehow determine the first and last entry for each, but I am having trouble doing this. I also still want the sp and PlayType included for these entries, or at least included for the "last" entry as I want to know the outcome of the drive (if there was a scoring play and what type of play it was)

dput:

df <- structure(list(GameID = c(2009091000, 2009091000, 2009091000, 
2009091000, 2009091000, 2009091000, 2009091000, 2009091001, 2009091001
), Drive = c(1, 1, 1, 2, 2, 2, 2, 1, 1), down = c(1, 2, 3, 1, 
2, 3, 4, 1, 2), yrdln = c(22, 25, 32, 42, 44, 43, 33, 5, 10), 
    PlayType = c("run", "pass", "sack", "pass", "run", "pass", 
    "Field Goal", "pass", "pass"), sp = c(0, 0, 0, 0, 0, 0, 1, 
    0, 1)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", 
"data.frame"))
Matt
  • 7,255
  • 2
  • 12
  • 34
Calum
  • 3
  • 1

1 Answers1

0

Using dplyr

library(dplyr)

df <- df %>%
  group_by(GameID,Drive) %>%
  mutate(min_down = min(down),
         max_down = max(down)) %>%
  filter(down == min_down | down == max_down) %>%
  dplyr::select(-c("min_down", "max_down"))

This gets you the following result:

      GameID Drive  down yrdln PlayType      sp
       <dbl> <dbl> <dbl> <dbl> <chr>      <dbl>
1 2009091000     1     1    22 run            0
2 2009091000     1     3    32 sack           0
3 2009091000     2     1    42 pass           0
4 2009091000     2     4    33 Field Goal     1
5 2009091001     1     1     5 pass           0
6 2009091001     1     2    10 pass           1
Matt
  • 7,255
  • 2
  • 12
  • 34
  • This definitely has helped! The only thing it is missing is being able to compare the drive starting yrdln to the last plays sp and PlayType. is there a way to separate these as separate variables? maybe something like GameID, Drive, start_yrdln, last_playType, last_sp. Or anything close to this? Thank you so much for already answering and any further help! – Calum Apr 08 '20 at 01:26
  • I was actually able to use what you gave to eventually get to the point I needed. Thank you! I would not have been able to do it without your assistance! – Calum Apr 08 '20 at 03:08
  • Awesome, glad you found it helpful – Matt Apr 08 '20 at 19:58