1

I would like to merge multiple dataframe in one shot. Below the method, I am currently using for merging. My data:

df <- data.frame('ID'=c('A','B','C'),'YEAR'=c('2020','2020','2020'),'MONTH'=c('1','1','1'),'DAY'=c('16','16','16'),'HOUR'=c('15','15','15'),'VALUE1'=c(1,2,3))

df1 <- data.frame('ID'=c('A','B','C'),'YEAR'=c('2020','2020','2020'),'MONTH'=c('1','1','1'),'DAY'=c('16','16','16'),'HOUR'=c('15','15','15'),'VALUE2'=c(3,4,5))

df2 <- data.frame('ID'=c('A','B','C'),'YEAR'=c('2020','2020','2020'),'MONTH'=c('1','1','1'),'DAY'=c('16','16','16'),'VALUE3'=c(6,7,8))

df3 <- data.frame('ID'=c('A','B','C'),'YEAR'=c('2020','2020','2020'),'MONTH'=c('1','1','1'),'VALUE4'=c(9,10,11))

My method of merging:

df4 <- merge(df,df1,by=c('ID','DAY','MONTH','YEAR','HOUR'),all.x=T)
df5 <- merge(df4,df2,by=c('ID','DAY','MONTH','YEAR'),all.x=T)   
df6 <- merge(df5,df3,by=c('ID','MONTH','YEAR'),all.x=T) 

My output:

ID MONTH YEAR DAY HOUR VALUE1 VALUE2 VALUE3 VALUE4
1  A     1 2020  16   15      1      3      6      9
2  B     1 2020  16   15      2      4      7     10
3  C     1 2020  16   15      3      5      8     11

Is there any good method of merging which can make code more efficient. I appreciate any help.

MLavoie
  • 9,671
  • 41
  • 36
  • 56
Tushar Lad
  • 490
  • 1
  • 4
  • 17
  • Have also a look at [Simultaneously merge multiple data.frames in a list](https://stackoverflow.com/q/8091303/10488504) and [Merging a lot of data.frames](https://stackoverflow.com/q/14096814/10488504). – GKi Feb 17 '20 at 11:49

2 Answers2

3

You can use merge in Reduce:

Reduce(merge, list(df, df1, df2, df3))
#  ID YEAR MONTH DAY HOUR VALUE1 VALUE2 VALUE3 VALUE4
#1  A 2020     1  16   15      1      3      6      9
#2  B 2020     1  16   15      2      4      7     10
#3  C 2020     1  16   15      3      5      8     11
GKi
  • 37,245
  • 2
  • 26
  • 48
1
library(dplyr)

df %>%
    left_join(df1, by=c('ID','DAY','MONTH','YEAR','HOUR')) %>%
    left_join(df2, by=c('ID','DAY','MONTH','YEAR')) %>%
    left_join(df3, by=c('ID','MONTH','YEAR'))

Output:

  ID YEAR MONTH DAY HOUR VALUE1 VALUE2 VALUE3 VALUE4
1  A 2020     1  16   15      1      3      6      9
2  B 2020     1  16   15      2      4      7     10
3  C 2020     1  16   15      3      5      8     11

So, this is a little more efficient, because it doesn't use memory for the intermediate results.

Georgery
  • 7,643
  • 1
  • 19
  • 52
  • @Thanks Georgery.This is helpful.is there any other method available? because if i merge large data code will take some time so i am curious to know if is there any? – Tushar Lad Feb 17 '20 at 10:32
  • You cannot get much faster than this. The `data.table` package could be a little bit faster, but probably not significantly. Sometimes it just takes some time. – Georgery Feb 17 '20 at 10:48