0

I have a data frame of sports games and want to get the average of the previous n points. For example, my data frame looks like this:

 Home     Away   Home    Away 
 Team     Team   Points  Points

1 Red    Green    2       1     
2 Green  Blue     3       4
3 Blue   Red      4       3
4 Red    Blue     2       4
5 Blue   Green    4       2
6 Green  Red      3       3

I'd like to have a new column for home team and away team average points in their last 2 games. I think that by creating a new column with both team names in it, I should be able to use str_detect to get the average points over the last n occurrences.

For example, In the above example, I'd like to take an average of the points scored for both teams over their last 2 games. What I'd like the outcome to look like is this:

 Home     Away   Home    Away    Game                Avg Points      Avg Points   
 Team     Team   Points  Points  ID                  (Home Team)     (Away Team)

1 Red    Green    2       1      2020_1_Red_Green     NA              NA
2 Green  Blue     3       4      2020_2_Green_Blue    NA              NA
3 Blue   Red      4       3      2020_3_Blue_Red      NA              NA
4 Red    Blue     2       4      2020_4_Red_Blue      2.5             4
5 Blue   Green    4       2      2020_5_Blue_Green    4               2
6 Green  Red      3       3      2020_Green_Red       2.5             2.5

I think the answer will require some type of str_detect, but I'm not sure how to find the last n number of games that either team has played.

What is a way to accomplish this?

ben wolf
  • 1
  • 1
  • I think i have seen same question elsewhere! – AnilGoyal Sep 26 '20 at 16:38
  • Does this answer your question? [Moving average of previous three values in R](https://stackoverflow.com/questions/16193333/moving-average-of-previous-three-values-in-r) – AnilGoyal Sep 26 '20 at 16:43

2 Answers2

0

Here's some dummy data which includes game_id to make joining easier.

library(tidyverse)

df <- tribble(
   ~game_id, ~home_team, ~away_team, ~home_points, ~away_points,
          1,      "Red",    "Green",            2,            1,
          2,    "Green",     "Blue",            3,            4,
          3,     "Blue",      "Red",            4,            3,
          4,      "Red",     "Blue",            2,            4,
          5,     "Blue",    "Green",            4,            2,
          6,    "Green",      "Red",            3,            3
)

Combine the home and away data into a longer data frame. The 2 game average can be calculated with lag.

avg_points <-
  bind_rows(
    df %>% transmute(game_id, team = home_team, points = home_points),
    df %>% transmute(game_id, team = away_team, points = away_points)
  ) %>%
    group_by(team) %>%
    arrange(game_id) %>%
    mutate(avg = (lag(points) + lag(points, 2)) / 2.0)
avg_points
#> # A tibble: 12 x 4
#> # Groups:   team [3]
#>    game_id team  points   avg
#>      <dbl> <chr>  <dbl> <dbl>
#>  1       1 Red        2  NA  
#>  2       1 Green      1  NA  
#>  3       2 Green      3  NA  
#>  4       2 Blue       4  NA  
#>  5       3 Blue       4  NA  
#>  6       3 Red        3  NA  
#>  7       4 Red        2   2.5
#>  8       4 Blue       4   4  
#>  9       5 Blue       4   4  
#> 10       5 Green      2   2  
#> 11       6 Green      3   2.5
#> 12       6 Red        3   2.5

Join the original data frame onto the new one.

df %>%
  inner_join(avg_points %>% select(game_id, home_team = team, avg))
#> Joining, by = c("game_id", "home_team")
#> # A tibble: 6 x 6
#>   game_id home_team away_team home_points away_points   avg
#>     <dbl> <chr>     <chr>           <dbl>       <dbl> <dbl>
#> 1       1 Red       Green               2           1  NA  
#> 2       2 Green     Blue                3           4  NA  
#> 3       3 Blue      Red                 4           3  NA  
#> 4       4 Red       Blue                2           4   2.5
#> 5       5 Blue      Green               4           2   4  
#> 6       6 Green     Red                 3           3   2.5
Paul
  • 8,734
  • 1
  • 26
  • 36
0

Here is another approach using pivot_longer and pivot_wider from tidyr.

The game_number is based on the row number, and game_id is created by pasting the team names together with the game_number (and adding year).

The pivot_longer will allow teams to be grouped together, to determine the last 2 games points. If there are less than 2 prior games, it sets to NA.

The pivot_wider will put the calculated averages into two columns (one for home, one for away).

library(tidyverse)

df %>%
  mutate(game_number = row_number(),
         game_id = paste(2020, game_number, home_team, away_team, sep = "_")) %>%
  pivot_longer(cols = -c(game_number, game_id), names_to = c("location", ".value"), names_sep = "_") %>%
  group_by(team) %>%
  mutate(average_points = ifelse(row_number() < 2, NA, (lag(points, 1L) + lag(points, 2L)) / 2)) %>%
  pivot_wider(id_cols = c(game_number, game_id), names_from = location, values_from = average_points, names_prefix = "avg_pts_")

Output

  game_number game_id           avg_pts_home avg_pts_away
        <int> <chr>                    <dbl>        <dbl>
1           1 2020_1_Red_Green          NA           NA  
2           2 2020_2_Green_Blue         NA           NA  
3           3 2020_3_Blue_Red           NA           NA  
4           4 2020_4_Red_Blue            2.5          4  
5           5 2020_5_Blue_Green          4            2  
6           6 2020_6_Green_Red           2.5          2.5

Data

df <- structure(list(home_team = c("Red", "Green", "Blue", "Red", "Blue", 
"Green"), away_team = c("Green", "Blue", "Red", "Blue", "Green", 
"Red"), home_points = c(2, 3, 4, 2, 4, 3), away_points = c(1, 
4, 3, 4, 2, 3)), class = "data.frame", row.names = c(NA, -6L))
Ben
  • 28,684
  • 5
  • 23
  • 45