Combining multiple data frames with different number of columns

Question

I'm preparing the stats for currency values, I have currency values for every day, every year is in an individual excel-file. I've managed to read xls files, but I want to merge the data frames for every year. They look like:

Date      EUR   SEK   NOR   PLN   ROM   SKR
1-1-1998  2,21  1,23  1,13  2,99  9,12  6,17
2-1-1998  1,13  12,2  2,11  9,22  1,11  2,33

But some of them has more colums (additional currency), like:

Date      EUR   SEK   NOR   PLN   ROM   SKR   JKC
1-1-1998  2,21  1,23  1,13  2,99  9,12  6,17  1,11
2-1-1998  1,13  12,2  2,11  9,22  1,11  2,33  2,13

I just want to merge them this way:

Date      EUR   SEK   NOR   PLN   ROM   SKR
1-1-1998  2,21  1,23  1,13  2,99  9,12  6,17
2-1-1998  1,13  12,2  2,11  9,22  1,11  2,33
Date      EUR   SEK   NOR   PLN   ROM   SKR   JKC
1-1-1998  2,21  1,23  1,13  2,99  9,12  6,17  1,11
2-1-1998  1,13  12,2  2,11  9,22  1,11  2,33  2,13

etc.

I tried to use merge(), but it treated identical values as one, rbind doesn't work because there are different numbers of columns. How should I combine those data frames?

Could you create empty columns that coincide with the other data frames (e.g. create the JKC column for your first data frame) so `rbind` does work? — ccapizzano, Jun 26 '14 at 12:52
Why wouldn't you merge on date? Your proposed format has a lot of redundant information. — rrs, Jun 26 '14 at 14:12

score 4 · Accepted Answer · edited Jun 26 '14 at 13:16

Like Henrik mentioned, rbind.fill from the plyr packge is very helpfull here:

library(plyr)

a <- data.frame(
     data = c('1-1-1998', '2-1-1998'),
     EUR = c(2.21, 1.13),
     SEK = c(1.23, 12.2)
)

b <- data.frame(
     data = c('1-1-1998', '2-1-1998'),
     EUR = c(2.21, 1.13),
     SEK = c(1.23, 12.2),
     JKC = c(1.11, 2.13)
)

rbind.fill(a, b)

Resulting in:

      data  EUR   SEK  JKC
1 1-1-1998 2.21  1.23   NA
2 2-1-1998 1.13 12.20   NA
3 1-1-1998 2.21  1.23 1.11
4 2-1-1998 1.13 12.20 2.13

score 3 · Answer 2 · answered Jun 26 '14 at 13:07

I'd suggest to melt the data to long format, which usually is easier to use for further analyses in R:

DF1 <- read.table(text="Date      EUR   SEK   NOR   PLN   ROM   SKR
1-1-1998  2,21  1,23  1,13  2,99  9,12  6,17
2-1-1998  1,13  12,2  2,11  9,22  1,11  2,33", header=TRUE, dec=",")

DF2 <- read.table(text="Date      EUR   SEK   NOR   PLN   ROM   SKR   JKC
1-2-1998  2,21  1,23  1,13  2,99  9,12  6,17  1,11
2-2-1998  1,13  12,2  2,11  9,22  1,11  2,33  2,13", header=TRUE, dec=",")

myfiles <- list(DF1, DF2) 
#use lapply(list.files(yourpath), read.table, header=TRUE, dec=",") to create this list


#reshape to long format
library(reshape2)
myfiles <- lapply(myfiles, melt, id.vars="Date", value.name="value", variable.name="currency")
mydata <- do.call(rbind, myfiles)
head(mydata)
#      Date currency value
#1 1-1-1998      EUR  2.21
#2 2-1-1998      EUR  1.13
#3 1-1-1998      SEK  1.23
#4 2-1-1998      SEK 12.20
#5 1-1-1998      NOR  1.13
#6 2-1-1998      NOR  2.11

If you must, you can then reshape back to wide format:

dcast(mydata, Date~currency)
#      Date  EUR   SEK  NOR  PLN  ROM  SKR  JKC
#1 1-1-1998 2.21  1.23 1.13 2.99 9.12 6.17   NA
#2 2-1-1998 1.13 12.20 2.11 9.22 1.11 2.33   NA
#3 1-2-1998 2.21  1.23 1.13 2.99 9.12 6.17 1.11
#4 2-2-1998 1.13 12.20 2.11 9.22 1.11 2.33 2.13

Combining multiple data frames with different number of columns

2 Answers2