0

I’m just getting started with R and putting together a sports database. I have dataframe A, which has two variables: Player_ID and Player_Name. I have dataframe B which is a robust table full of season stats for every player but fails to have their player ID. I would like to make either a change to DB B which includes that players ID, or somehow merge the two where I essentially have Database C which has an additional variable next to player name which would be their ID (from DB A). Any guidance would be very much appreciated!

  • Sounds like you need `merge` or one of the `dplyr` join functions. Difficult to help further unless you provide some example data, in a plan text format, from each of the dataframes. – neilfws Aug 06 '19 at 23:26
  • I need to take the PlayerIDs from one dataframe and insert it into the dataframe of the other (based on the player name, it pulls in the correct ID) – MadtownBrian Aug 06 '19 at 23:31
  • 2
    [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code, and a clear explanation of what you're trying to do and what hasn't worked. As has been mentioned, it's probably a join, which has been done in a lot of other SO posts already – camille Aug 06 '19 at 23:32
  • Assuming that dataframe B contains column `Player_Name` then something like `dfC <- merge(dfA, dfB, by = "Player_Name")` should work, assuming unique player names. – neilfws Aug 06 '19 at 23:55

1 Answers1

0

There is an excellent book available here - R for Data Science. You can read it online. The book is easy to carry around for a reference. You can read about Mutating joins for a relational data. As indicated by others in the comment above, give a reproducible example so it'll be easy to give a more directed answer.

On the basis of your description, here is an example of a right_join that can get you what you want.

# A tibble: 4 x 2 (data frame A)
  Player_Id Name  
      <dbl> <chr> 
1         1 Name1 
2         2 Name2 
3         3 Name3 
4         4 Uknown  
# -------------------------------------------------------------------------
# A tibble: 4 x 2 (data frame B)
  Player Stat_score
  <chr>       <dbl>
1 Name1         193
2 Name2         222
3 Name3         333
4 Name4         444
# -------------------------------------------------------------------------
# The right join between column Name of df-A and Player of df-B using dplyr
library(dplyr)
C <- A %>%
right_join(B, c("Name" = "Player"))
# -------------------------------------------------------------------------
# the join results a df-C with 'NA' for those names in A but not in B
# A tibble: 4 x 3 (data frame C)
  Player_Id Name  Stat_score
      <dbl> <chr>      <dbl>
1         1 Name1        193
2         2 Name2        222
3         3 Name3        333
4        NA Name4        444
deepseefan
  • 3,701
  • 3
  • 18
  • 31
  • Brilliant. Absolutely brilliant. Every day I am amazed at what R (along with dplyr) can produce with such efficient pieces of code. Many thanks for your help! – MadtownBrian Aug 07 '19 at 13:05