I’m just getting started with R and putting together a sports database. I have dataframe A, which has two variables: Player_ID and Player_Name. I have dataframe B which is a robust table full of season stats for every player but fails to have their player ID. I would like to make either a change to DB B which includes that players ID, or somehow merge the two where I essentially have Database C which has an additional variable next to player name which would be their ID (from DB A). Any guidance would be very much appreciated!
Asked
Active
Viewed 35 times
0
-
Sounds like you need `merge` or one of the `dplyr` join functions. Difficult to help further unless you provide some example data, in a plan text format, from each of the dataframes. – neilfws Aug 06 '19 at 23:26
-
I need to take the PlayerIDs from one dataframe and insert it into the dataframe of the other (based on the player name, it pulls in the correct ID) – MadtownBrian Aug 06 '19 at 23:31
-
2[See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code, and a clear explanation of what you're trying to do and what hasn't worked. As has been mentioned, it's probably a join, which has been done in a lot of other SO posts already – camille Aug 06 '19 at 23:32
-
Assuming that dataframe B contains column `Player_Name` then something like `dfC <- merge(dfA, dfB, by = "Player_Name")` should work, assuming unique player names. – neilfws Aug 06 '19 at 23:55
1 Answers
0
There is an excellent book available here - R for Data Science. You can read it online. The book is easy to carry around for a reference. You can read about Mutating joins for a relational data. As indicated by others in the comment above, give a reproducible example so it'll be easy to give a more directed answer.
On the basis of your description, here is an example of a right_join
that can get you what you want.
# A tibble: 4 x 2 (data frame A)
Player_Id Name
<dbl> <chr>
1 1 Name1
2 2 Name2
3 3 Name3
4 4 Uknown
# -------------------------------------------------------------------------
# A tibble: 4 x 2 (data frame B)
Player Stat_score
<chr> <dbl>
1 Name1 193
2 Name2 222
3 Name3 333
4 Name4 444
# -------------------------------------------------------------------------
# The right join between column Name of df-A and Player of df-B using dplyr
library(dplyr)
C <- A %>%
right_join(B, c("Name" = "Player"))
# -------------------------------------------------------------------------
# the join results a df-C with 'NA' for those names in A but not in B
# A tibble: 4 x 3 (data frame C)
Player_Id Name Stat_score
<dbl> <chr> <dbl>
1 1 Name1 193
2 2 Name2 222
3 3 Name3 333
4 NA Name4 444

deepseefan
- 3,701
- 3
- 18
- 31
-
Brilliant. Absolutely brilliant. Every day I am amazed at what R (along with dplyr) can produce with such efficient pieces of code. Many thanks for your help! – MadtownBrian Aug 07 '19 at 13:05