0

epl_schedule_df

epl_ratings_df

I have a df with English Premier League team ratings and another df with the full season schedule. I wish to be able to attach to the schedule df each team's rating as a variable so I can produce the probabilities per game. A step later would be to simulate the entire season.

I have tried to write an if statement to match character strings of df_1 to df_2 but I do not believe I am on the right path.

I am sure this is low level coding to most and I appreciate the help. I tried working on it before coming here. I sincerely thank you.

vec_1 <- c("team_a", "team_b", "team_c")
vec_2 <- c(1.7, 1.2, 0.8)
vec_3 <- c("team_d", "team_e", "team_f")
vec_4 <- c(0.3, 0.5, 0.4)

# df_1 ratings df

df_1 <- data_frame(team = vec_1, rating = vec_2)

 team   rating
  <chr>   <dbl>
1 team_a    1.7
2 team_b    1.2
3 team_c    0.8

# df_2 schedule df

df_2 <- data_frame(home_tm = vec_1, away_tm = vec_3)

  home_tm away_tm
  <chr>   <chr>  
1 team_a  team_d 
2 team_b  team_e 
3 team_c  team_f 

Desired outcome:

  home_tm away_tm home_tm_rat away_tm_rat
  <chr>   <chr>         <dbl>         <dbl>
1 team_a  team_d          1.7           0.3
2 team_b  team_e          1.2           0.5
3 team_c  team_f          0.8           0.4
......
......
......
  • Checkout `join` from `dplyr` – Sonny May 07 '19 at 12:20
  • 1
    Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – Wil May 07 '19 at 12:32

2 Answers2

1

As comment above, may check join from dplyr:

df_2 %>% 
  left_join(df_1, by= c('home_tm' = 'team')) %>% 
  rename(home_tm_rat = rating) %>% 
  left_join(df_1, by = c('away_tm' = 'team')) %>% 
  rename(away_tm_rat = rating) 

# A tibble: 3 x 4
  home_tm away_tm home_tm_rat away_tm_rat
  <chr>   <chr>         <dbl>       <dbl>
1 team_a  team_d          1.7         0.3
2 team_b  team_e          1.2         0.5
3 team_c  team_f          0.8         0.4
liuminzhao
  • 2,385
  • 17
  • 28
0

Similar to @liuminzhao, but I'd also recommend thinking about your data structure a little. Things will be easier if you have all teams in df_2 in a single column, with a separate column to indicate who is home/away. Read more about tidy data here

library(tidyverse)

df_2 %>% 
  #gather the two columns of teams into a single column, using another column to indicate home/away
  gather(key = HomeAway, value = team) %>% 
  #join the team ratings
  left_join(df_1, by = c("team" = "team"))


# A tibble: 6 x 3
HomeAway team   rating
<chr>    <chr>   <dbl>
1 home_tm  team_a    1.7
2 home_tm  team_b    1.2
3 home_tm  team_c    0.8
4 away_tm  team_d   NA  
5 away_tm  team_e   NA  
6 away_tm  team_f   NA 
Jordo82
  • 796
  • 4
  • 14