0

I have 7 big data.frames with sport statistic. They have different number of rows and different number of columns but some of columns have similar value (game id, player id so I've created game_player_id to merge them). In few of these data.frames some of rows don't exists (player hasn't played) and in others they exist because these players are listed as DNP. I would like to merge these 7 frames into one big one with all players (active and not). merge() works for frames with the same number of rows but it doesn't when I'm trying to run function. As a result I got only columns names and

<0 rows> (or 0-length row.names)

GAME_PLAYER_ID MIN AST
001001         86  1
001002         90  0
001003         30  0
001004         0   DNP
001005         90  2

GAME_PLAYER_ID MIN PASS
001001         86  25
001002         90  45
001003         30  25
001005         90  39

expected result

GAME_PLAYER_ID MIN AST PASS
001001         86  1   25
001002         90  0   45
001003         30  0   25
001004         DNP   DNP DNP/NA (whatever but not 0)
001005         90  2   39

Where is the problem? How to do this in simple way? I probably may write a function but it's gonna be complicated (a lot of rows, and more than 20 columns in every data.frame), is there any package which can help me handle with these?

AlienDeg
  • 1,288
  • 1
  • 13
  • 23

1 Answers1

2

You want to perform a Full Outer Join: What is the difference between "INNER JOIN" and "OUTER JOIN"?

> df1 = read.table(header = TRUE, text = 
+ "GAME_PLAYER_ID MIN AST
+ 001001         86  1
+ 001002         90  0
+ 001003         30  0
+ 001004         0   DNP
+ 001005         90  2")
> 
> df2 = read.table(header = TRUE, text = 
+ "GAME_PLAYER_ID MIN PASS
+ 001001         86  25
+ 001002         90  45
+ 001003         30  25
+ 001005         90  39")
> 
> df3 = merge(df1, df2, all = TRUE)
> df3
  GAME_PLAYER_ID MIN AST PASS
1           1001  86   1   25
2           1002  90   0   45
3           1003  30   0   25
4           1004   0 DNP   NA
5           1005  90   2   39

This will put NA in any cells not populated by the input dataframes. For future reference, I've gone to this question more times than I could count: How to join (merge) data frames (inner, outer, left, right)?

Community
  • 1
  • 1
kevinsa5
  • 3,301
  • 2
  • 25
  • 28
  • Thank you for that. I didn't see this post in recommended and I probably asked wrong question in Google. That helped. – AlienDeg Aug 03 '15 at 04:56
  • Knowing the right words to search for is 90% of the work, I've found. Hopefully this question will show up for people in a similar position. – kevinsa5 Aug 03 '15 at 04:57
  • @AlienDeg And see the comment to your question. –  Aug 03 '15 at 05:00
  • Yep. It's my first coding language so I didn't have idea about inner/outer thing. Can I get rid of duplicated columns? I can see suffixes, but maybe something to not have them? – AlienDeg Aug 03 '15 at 05:00
  • You can merge by more than one column in that case you do something like this: `df3 = merge(df1, df2,by=c("name of first column in both df","name of second column in both df"), all = TRUE)` – Sarina Aug 03 '15 at 11:00