3

I have three dataframes of data. I would like to join or merge the dataframes so that they are all in the one dataframe based on Day (but not all columns have data for each day), but I also want to keep all of the other columns.

I have the prepared the following dummy data:

# Create three dataframes of dummy data
df = data.frame(matrix(rnorm(20), nrow=10))
df2 = data.frame(matrix(rnorm(15), nrow=5))
df3 = data.frame(matrix(rnorm(30), nrow=10))

Days = seq(1:10)
Days2 = seq(from =5, to=9)

df1_all <- data.frame(Days, df)
colnames(df1_all) <- c("Days", "Survey1", "Survey2")
df2_all <- data.frame(Days2, df2)
colnames(df2_all) <- c("Days", "Survey3", "Survey4", "Survey5")
df3_all <- data.frame(Days, df3)
colnames(df3_all) <- c("Days", "Survey6", "Survey7", "Survey8")

How would the three dataframes be combined so they have a common column of Days but all of the survey columns remain?

As you can see df1_all and df3_all have days that are 1 to 10, but df2_all has days that are 5 to 9.

Mikhail Kholodkov
  • 23,642
  • 17
  • 61
  • 78
helplost
  • 43
  • 3

2 Answers2

4

In base R we can use merge with Reduce

Reduce(function(x, y) merge(x, y, by = "Days", all = T), list(df1_all, df2_all, df3_all))
#   Days    Survey1      Survey2      Survey3    Survey4    Survey5     Survey6
#1     1 -0.4968500 -1.157808548           NA         NA         NA  0.85023226
#2     2 -1.8060313  0.656588464           NA         NA         NA  0.69760871
#3     3 -0.5820759  2.548991071           NA         NA         NA  0.54999735
#4     4 -1.1088896 -0.034760390           NA         NA         NA -0.40273198
#5     5 -1.0149620 -0.669633580  0.336472797  2.0702709 -0.3170591 -0.19159377
#6     6 -0.1623095 -0.007604756  0.006892838 -0.1533984 -0.1777900 -1.19452788
#7     7  0.5630558  1.777084448 -0.455468738 -1.3907009 -0.1699941 -0.05315882
#8     8  1.6478175 -1.138607737 -0.366523933 -0.7235818 -1.3723019  0.25519600
#9     9 -0.7733534  1.367827179  0.648286568  0.2582618 -0.1737872  1.70596401
#10   10  1.6059096  1.329564791           NA         NA         NA  1.00151325
#       Survey7     Survey8
#1  -0.49558344 -0.82599859
#2   0.35555030  0.16698928
#3  -1.13460804 -0.89626463
#4   0.87820363  0.16818539
#5   0.97291675  0.35496826
#6   2.12111711 -0.05210512
#7   0.41452353 -0.19593462
#8  -0.47471847 -0.64906975
#9   0.06599349 -1.10976723
#10 -0.50247778  0.84927420

Or using dplyr::full_join

Reduce(dplyr::full_join, list(df1_all, df2_all, df3_all))

giving the same result (full_join automatically identifies the common column by which to join).

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

With dplyr :

library(dplyr)
df1_all %>% full_join(df2_all) %>% full_join(df3_all)
Nicolas2
  • 2,170
  • 1
  • 6
  • 15