1

How do you join many tables together in R? They all have a date column and then one other variable. The data was collected daily for many years (some since the 1970s when the instruments were first installed).

  • df1 contains (date, var1)
  • df2 contains (date, var2)
  • ...
  • df5 contains (date, var5)

The dataframes are of different lengths, depending on the start date for data collection.

I would like a new dataframe (DF6) that contains (date, var1, var2, var3, var4, var5). I expect that there will be a lot of NAs where the variable data is missing for earlier dates from some of the dataframes.

I have tried this:

DF6 <- full_join(df1,df2,df3, df4, df5, by = "date")

but I received this error:

Error in full_join(): ! suffix must be a character vector of length 2, not a <data.frame> object of length

Is there something obvious that I'm missing? Is there any easy way to join many dataframes?

L Tyrone
  • 1,268
  • 3
  • 15
  • 24
ayesha
  • 135
  • 15

2 Answers2

1

One option is this, using dplyr and purrr.

library(dplyr) 
library(purrr)

list(df1, df2, df3) %>% 
  reduce(left_join, by = "date") %>% 
  mutate(value = coalesce(!!!select(., starts_with("var")))) %>% 
  select(date, value)

#   date   value
# 1   70    14
# 2   80    24
# 3   90    94
# 4   10   103
Rodrigo
  • 53
  • 8
1

A reproducible example with some "toy" data based on the structure of the dataframes you mentioned in your question:

library(dplyr)
library(purrr)

# set.seed() value used to make the final results of this example 
# reproducible, only needed for reproducing the "toy" data
set.seed(1) 

# Create five "toy" dataframes
for(i in 1:5) {
  sDate <- sample(seq(as.Date('1970/01/01'), as.Date('2023/05/03'), by="day"), 1)
  tmp <- data.frame(date = seq(sDate, as.Date('2023/05/03'), by="day"))
  tmp[,paste0("var", i)] <- sample(1:100, length(tmp[,1]), replace = TRUE)
  assign(paste0("df", i), tmp)
}

# Create list of all five dataframes
df_list <- list(df1, df2, df3, df4, df5)

# Use reduce() from the purrr package to full join the dataframes
df6 <- df_list %>% reduce(full_join, by = "date")

# Sample of results with anticipated NA behaviour
df6[1:10,]
#          date var1 var2 var3 var4 var5
# 1  2017-08-22   68   30   NA  100   NA
# 2  2017-08-23   39   45   NA    9   NA
# 3  2017-08-24    1   57   NA   45   NA
# 4  2017-08-25   34   97   NA   91   NA
# 5  2017-08-26   87   99   NA   22   NA
# 6  2017-08-27   43   21   NA   18   NA
# 7  2017-08-28   14   25   NA   54   NA
# 8  2017-08-29   82   17   NA   82   NA
# 9  2017-08-30   59   21   NA   15   NA
# 10 2017-08-31   51   29   NA   83   NA
L Tyrone
  • 1,268
  • 3
  • 15
  • 24