0

data frame is 12 columns id, season, week, season_type, start_date, home_team, home_points, away_team, away_points, Line, home_cover, away_cover

        id season  week season_type start_date home_team        home_points away_team             away_points  Line home_cover away_cover
      <dbl>  <dbl> <dbl> <chr>            <dbl> <chr>                  <dbl> <chr>                       <dbl> <dbl> <chr>      <chr>     
1 400603840   2015     1 regular         42251. South Carolina            17 North Carolina                 13  -3.5 Y          N         
2 400763593   2015     1 regular         42251. UCF                       14 Florida International          15 -17   N          Y         
3 400763399   2015     1 regular         42251. Central Michigan          13 Oklahoma State                 24  20.5 Y          N         
4 400603839   2015     1 regular         42251  Vanderbilt                12 Western Kentucky               14 -17.5 N          Y         
5 400756883   2015     1 regular         42251. Utah                      24 Michigan                       17  -3   Y          N         
6 400763398   2015     1 regular         42251. Minnesota                 17 TCU                            23  16   Y          N         

what want is to find what team each away_team and home_team played the week before and cannot figure it out for life of me

r2evans
  • 141,215
  • 6
  • 77
  • 149
Drew
  • 5
  • 4
  • 4
    According to this data, nobody played anything the week before. I suggest you read https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info for discussions on making a question self-contained and ***reproducible*** (key word), including data that is both minimal and representative of your problem. – r2evans Sep 24 '21 at 11:58
  • Yes that’s just the head week goes 1-16 season goes 2015-2020, how much more of the day would be needed for this question? – Drew Sep 24 '21 at 12:17
  • Think about it: how many weeks do *you need* to see in a solution that would justify in your mind that the method worked robustly and consistently? One week is obviously not enough; two weeks? It might be luck that it happened to work. Three weeks is getting better. My answer below shows four weeks, which I think does a reasonable job of demonstrating the problem. You can test with my data (it's completely reproducible) if you'd like, then adapt (column names and such) to your own data. – r2evans Sep 24 '21 at 12:33

1 Answers1

0

I'll suggest dplyr for this, as I think it's easier to understand the flow.

First, though, your sample data does not include multiple weeks, so it is insufficient to demonstrate this. Here's a little sample data (don't worry about understanding this part, it's just to create fake data):

library(dplyr)
set.seed(42)
dat <- bind_rows(lapply(1:4, function(w) data.frame(season=2021, week=w, home_team=sample(LETTERS[1:4]), away_team=sample(LETTERS[5:8]))))
dat
#    season week home_team away_team
# 1    2021    1         A         F
# 2    2021    1         D         H
# 3    2021    1         C         G
# 4    2021    1         B         E
# 5    2021    2         D         H
# 6    2021    2         C         E
# 7    2021    2         B         G
# 8    2021    2         A         F
# 9    2021    3         D         G
# 10   2021    3         B         E
# 11   2021    3         C         H
# 12   2021    3         A         F
# 13   2021    4         D         E
# 14   2021    4         A         F
# 15   2021    4         C         G
# 16   2021    4         B         H

From here, we'll (1) shift the data a week, then (2) "join" twice. (For a good read on the concept of merge/join, see How to join (merge) data frames (inner, outer, left, right) and What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?)

shifted <- dat %>%
  transmute(week = week + 1, home_prevaway = home_team, away_prevaway = away_team)
left_join(dat, shifted, by = c("week", "home_team" = "home_prevaway")) %>%
  left_join(., shifted, by = c("week", "away_team" = "away_prevaway"))
#    season week home_team away_team away_prevaway home_prevaway
# 1    2021    1         A         F          <NA>          <NA>
# 2    2021    1         D         H          <NA>          <NA>
# 3    2021    1         C         G          <NA>          <NA>
# 4    2021    1         B         E          <NA>          <NA>
# 5    2021    2         D         H             H             D
# 6    2021    2         C         E             G             B
# 7    2021    2         B         G             E             C
# 8    2021    2         A         F             F             A
# 9    2021    3         D         G             H             B
# 10   2021    3         B         E             G             C
# 11   2021    3         C         H             E             D
# 12   2021    3         A         F             F             A
# 13   2021    4         D         E             G             B
# 14   2021    4         A         F             F             A
# 15   2021    4         C         G             H             D
# 16   2021    4         B         H             E             C

home_prevaway is the previous team the current home team played; similarly for away_prevaway.

For verification, in week 1 team "B" (home) played team "E" (away), and team "D" played team "H". In week 2, B's home_prevaway is "E", and D's home_prevaway is "H". (The fact that teams A and D played the same away two weeks in a row is an inconvenient coincidence given the random data.)

r2evans
  • 141,215
  • 6
  • 77
  • 149