I have 15 datasets. The 1st column is "subject" and is identical in all sets. The number of the rest of the columns is not the same in all datasets. I need to combine all of this data in a single dataframe. I found the command "Reduce" but I am just starting with R and I couldn't understand if this is what I need and if so, what is the syntax? Thanks!
Asked
Active
Viewed 111 times
0
-
On SO you need to provide an example of what you have tried first really, I wouldn't just ask people to solve your problems without at least showing you've tried various solutions first. Post some code and explain why it doesn't work for you. – Single Entity Apr 18 '17 at 22:28
-
Does each data frame have different types of data on the same universe of subjects or does each data frame have data on different subjects? – eipi10 Apr 18 '17 at 22:31
-
`Reduce` is nothing to do with combining data frames. You may want to look at `bind_rows` in the `dplyr` package, but it's difficult to know what will work without more details of the data. – neilfws Apr 18 '17 at 22:34
-
1Maybe something like `dfnew = Reduce(dplyr::bind_rows, list(df1,df2,df3))` if you want to "stack" your data, or `dfnew = Reduce(dplyr::full_join, list(df1, df2, df3))` if you want to combine columns from different data frames based on one or more "key" columns (such as `subject` in your case). In both cases, `list(df1, df2, df3, ...)` is a list containing the names of all of your data frames (although, instead of creating the list by hand, it would be easier if you read all of the data frames into a list to begin with). – eipi10 Apr 18 '17 at 22:50
1 Answers
0
I suggest including a reproducible example in the future so that others can see the format of data you're working with and what you're trying to do.
Here is some randomly generated example data, each with the "Subject" column:
list_of_dfs <- list(
df1 = data.frame(Subject = 1:4, a = rnorm(4), b = rnorm(4)),
df2 = data.frame(Subject = 5:8, c = rnorm(4), d = rnorm(4), e = rnorm(4)),
df3 = data.frame(Subject = 7:10, f = rnorm(4)),
df4 = data.frame(Subject = 2:5, g = rnorm(4), h = rnorm(4))
)
Reduce
with merge
is a good choice:
combined_df <- Reduce(
function(x, y) { merge(x, y, by = "Subject", all = TRUE) },
list_of_dfs
)
And the output:
> combined_dfs
Subject a b c d e f g h
1 1 1.1106594 1.2530046 NA NA NA NA NA NA
2 2 -1.0275630 0.6437101 NA NA NA NA -1.9393347 -0.4361952
3 3 0.1558639 1.2792212 NA NA NA NA -0.8861966 1.0137530
4 4 0.4283585 -0.1045530 NA NA NA NA 1.8924896 -0.3788198
5 5 NA NA 0.08261190 0.77058804 -1.165042 NA 0.7950784 -1.3467386
6 6 NA NA 2.51214598 0.62024328 1.496520 NA NA NA
7 7 NA NA 0.01581309 -0.04777196 -1.327884 1.5111734 NA NA
8 8 NA NA 0.80448136 -0.33347573 -2.290428 -0.3863564 NA NA
9 9 NA NA NA NA NA -1.2371795 NA NA
10 10 NA NA NA NA NA 1.6819063 NA NA