0

Hello so I have two data sets I am trying to combine. The problem is I need to combine them by a certain column and certain values in the column. They both have a column called player_id. One data set has only players ids. First data set

The second data set has the number of home runs a player had and the player id. The problem is the second dataset has a ton of irrelevant players I don't need. So I need to merge them just by the player ids from data set 1. data set 2

Here is my code, the combine does not work because It brings all of the players ids from both data sets.

player2 = subset(player, select = c(player_id, birth_state))

player.mt <- player[ which(player$birth_state =='MT'),]
player.mt2 = subset(player.mt, select = c(player_id))
batting.hr <- subset(batting, select = c(player_id, hr))
batting.hr

combine <- merge(player.mt2, batting.hr, by=c("player_id"), all=TRUE) 
packerfan
  • 41
  • 1
  • 1
  • 4

4 Answers4

1

This is a simple and common problem, search around for it a bit. What you want is an inner merge where you keep data only if the id column is in both. One character change is the difference.

combine <- merge(player.mt2, batting.hr, by=c("player_id"), all=F) 

Alternatively, if you wanted to keep all those in the player dataset (regardless of whether or not they were present in homeruns) and not all those in the homerun dataset you could do:

combine <- merge(player.mt2, batting.hr, by=c("player_id"), all.x=T, all.y=F) 

It all comes down to the all part of your code. The documentation is pretty self explanatory in ?merge() and this question is answered all over here and elsewhere.

cparmstrong
  • 799
  • 6
  • 23
0

You need to use:

combine<-plyr::join_all(list(player.mt2, batting.hr), by=c("player_id"),type="inner")
JeanVuda
  • 1,738
  • 14
  • 29
0

You basically have it, just change the all argument to keep those you are interested in:

combine <- merge(x=player.mt2, y=batting.hr, by=c("player_id"), all.x=TRUE) 
Nate
  • 364
  • 1
  • 5
0

What you need is a left join. I would use the dplyr package.

combine <- left_join(player.mt2, battling.hr, by = "player_id")
dasds
  • 170
  • 1
  • 11