0

Extremely new to R and coding in general. My intuition is that this should have a very basic answer, so feel free to send me back to basic intro class if this is too basic to spend your time on.

To make things easier I will reduce my problem to a much more simple situation with the same salient features.

I have two dataframes. The first shows how many games some people played as "white". The second shows how many games some people payed as "black". Some players played both as white and black, some others played only in one of these roles.

I would like to merge these two dataframes into one showing all players who have played in either role and how many total games they played, whether as white or black.

A reproducible example:

player_as_white <- c('John', 'Max', 'Grace', 'Zoe', 'Peter')
games_white <- c(sample(1:20,5))
dat1 <- data.frame(player_as_white, games_white)
player_as_black <- c('John', 'Eddie', 'Zoe')
games_black <- c(sample(1:20, 3))
dat2 <- data.frame(player_as_black, games_black)

How do I get a consolidated dataset showing how many total games all 6 players have played, whether as white or black?

Thanks!

bbip
  • 83
  • 5
  • 1
    Welcome to SO. Take a look at this [post](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) that explains in detail how different types of joins work – mnm Jul 03 '19 at 23:03

1 Answers1

0

For reproducibility, it's good practice to specify a random seed so the example works the same each time you run it, and for others. I'd also suggest using stringsAsFactors = FALSE so that the names are treated as characters and not factors, which will make this a little simpler. (edit: But it should work fine here with the default, too.)

set.seed(0)
player_as_white <- c('John', 'Max', 'Grace', 'Zoe', 'Peter')
games_white <- c(sample(1:20,5))
dat1 <- data.frame(player_as_white, games_white, stringsAsFactors = FALSE)
player_as_black <- c('John', 'Eddie', 'Zoe')
games_black <- c(sample(1:20, 3))
dat2 <- data.frame(player_as_black, games_black, stringsAsFactors = FALSE)

Then we can use merge to combine the two:

merge(dat1, dat2, by.x = "player_as_white", by.y = "player_as_black", all = T)

#  player_as_white games_white games_black
#1           Eddie          NA          18
#2           Grace           7          NA
#3            John          18           5
#4             Max           6          NA
#5           Peter          15          NA
#6             Zoe          10          19

Or a dplyr solution, which keeps the order from dat1

library(dplyr)
full_join(dat1, dat2, by = c("player_as_white" = "player_as_black"))

#  player_as_white games_white games_black
#1            John          18           5
#2             Max           6          NA
#3           Grace           7          NA
#4             Zoe          10          19
#5           Peter          15          NA
#6           Eddie          NA          18
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • what difference does it make for the string to be character or factor, at least in this question? I think unless there is a need for explicit string related operations, like summarizing the factors. Can you elaborate on it? – mnm Jul 03 '19 at 23:08
  • Upon further research I learned that `merge` should handle the combination of the two factors fine. https://stackoverflow.com/a/23814926/6851825 Nonetheless, I recall finding factors very confusing when I first started with R, and my personal preference is to convert to factors deliberately rather than by default. – Jon Spring Jul 03 '19 at 23:13
  • keeping personal preferences aside, I think there exists a scientific reason to why the difference between character and factor elaborated in https://stackoverflow.com/questions/8652694/r-use-of-factor and https://datascience.stackexchange.com/questions/12018/when-to-choose-character-instead-of-factor-in-r – mnm Jul 03 '19 at 23:22