Merge Multiple Data Frames with Different Feature Sets

Question

I tried searching and found similar-ish problems, but not the same problem. This seems like this should be somewhat common so I probably am just missing it.

I have multiple CSV files I am reading in with read.table. I want to merge these together such that rows with the same ID column are cbind() together, and the rest are left blank. An example below probably better illustrates

File 1

UserID    val1     val2    
  1                  2
  2        1         3

File 2

UserID    col1  
  1         a
  3         z

File 3

UserID    feat1    feat2
  1        Hi       Hello
  3        Moshi    Moshi

Desired Result

UserID      val1      val2    col1     feat1    feat2
1                      2        a       Hi       Hello
2            1         3
3                               z       Moshi    Moshi

I don't want it to add zeroes for missing values. I don't want it to fill anything with NAs, if possible.

I just want to combine datasets (3+ in all cases, average of 10) on a UserID column, keeping everything else the same.

I'm sure there is a tool for this, I just can't find it.

you can use reduce: `df = Reduce(function(...) merge(..., all = T), list(file_1, file_2, file_3))` as for fill value merging, easiest way i can think of is to do: `df[is.na(df)] = ''` this assumes that all `NA` values are treated the same in the dataset. otherwise you'll have to code different `NA`s witihn each dataframe not to lose information. — bringtheheat, Dec 13 '17 at 18:59
The correct way to do this will result in `NA` in non-matching positions. That's the correct value to place there. If you want something else, you'll need a second step to replace them. But it's sort of nonsensical to require they all be replaced with the same value if you have a mix of character/numeric columns. So you're better off sticking with NA. — joran, Dec 13 '17 at 19:01

score -1 · Answer 1 · answered Dec 13 '17 at 18:59

-1

The merge function sounds right for this.

merged_df <- merge(df1, df2, by="UserID")

You can then apply this iteratively, e.g.

for(i in 2:number_data_frames){
  merged_df <- merge(merged_df, list_of_dfs[[i]], by="UserID")
}

answered Dec 13 '17 at 18:59

ge.org

69
3

Merge Multiple Data Frames with Different Feature Sets

1 Answers1