Combining datasets

Question

I have 15 datasets. The 1st column is "subject" and is identical in all sets. The number of the rest of the columns is not the same in all datasets. I need to combine all of this data in a single dataframe. I found the command "Reduce" but I am just starting with R and I couldn't understand if this is what I need and if so, what is the syntax? Thanks!

On SO you need to provide an example of what you have tried first really, I wouldn't just ask people to solve your problems without at least showing you've tried various solutions first. Post some code and explain why it doesn't work for you. — Single Entity, Apr 18 '17 at 22:28
Does each data frame have different types of data on the same universe of subjects or does each data frame have data on different subjects? — eipi10, Apr 18 '17 at 22:31
`Reduce` is nothing to do with combining data frames. You may want to look at `bind_rows` in the `dplyr` package, but it's difficult to know what will work without more details of the data. — neilfws, Apr 18 '17 at 22:34
Maybe something like `dfnew = Reduce(dplyr::bind_rows, list(df1,df2,df3))` if you want to "stack" your data, or `dfnew = Reduce(dplyr::full_join, list(df1, df2, df3))` if you want to combine columns from different data frames based on one or more "key" columns (such as `subject` in your case). In both cases, `list(df1, df2, df3, ...)` is a list containing the names of all of your data frames (although, instead of creating the list by hand, it would be easier if you read all of the data frames into a list to begin with). — eipi10, Apr 18 '17 at 22:50

score 0 · Answer 1 · edited May 23 '17 at 12:02

I suggest including a reproducible example in the future so that others can see the format of data you're working with and what you're trying to do.

Here is some randomly generated example data, each with the "Subject" column:

list_of_dfs <- list(
   df1 = data.frame(Subject = 1:4, a = rnorm(4), b = rnorm(4)),
   df2 = data.frame(Subject = 5:8, c = rnorm(4), d = rnorm(4), e = rnorm(4)),
   df3 = data.frame(Subject = 7:10, f = rnorm(4)),
   df4 = data.frame(Subject = 2:5, g = rnorm(4), h = rnorm(4))
)

Reduce with merge is a good choice:

combined_df <- Reduce(
  function(x, y) { merge(x, y, by = "Subject", all = TRUE) },
  list_of_dfs
)

And the output:

> combined_dfs
   Subject          a          b          c           d         e          f          g          h
1        1  1.1106594  1.2530046         NA          NA        NA         NA         NA         NA
2        2 -1.0275630  0.6437101         NA          NA        NA         NA -1.9393347 -0.4361952
3        3  0.1558639  1.2792212         NA          NA        NA         NA -0.8861966  1.0137530
4        4  0.4283585 -0.1045530         NA          NA        NA         NA  1.8924896 -0.3788198
5        5         NA         NA 0.08261190  0.77058804 -1.165042         NA  0.7950784 -1.3467386
6        6         NA         NA 2.51214598  0.62024328  1.496520         NA         NA         NA
7        7         NA         NA 0.01581309 -0.04777196 -1.327884  1.5111734         NA         NA
8        8         NA         NA 0.80448136 -0.33347573 -2.290428 -0.3863564         NA         NA
9        9         NA         NA         NA          NA        NA -1.2371795         NA         NA
10      10         NA         NA         NA          NA        NA  1.6819063         NA         NA

Combining datasets

1 Answers1