The first thing that usually comes to mind with these types of problems is merge
, perhaps in conjunction with a Reduce(function(x, y) merge(x, y, by = "somecols", all = TRUE), yourListOfDataFrames)
.
However, merge
is not always the most efficient function, especially since it looks like you want to "collapse" all the values to fill in the rows from left to right, which would not be the default merge
behavior.
Instead, I suggest you stack everything into one long data.frame
and reshape it after you have added an index variable.
Here are two approaches:
Option 1: "dplyr" + "tidyr"
- Use
mget
to put all of your data.frame
s into a list
.
- Use
rbind_all
to convert that list
into a single data.frame
.
- Use
sequence(n())
in mutate
from "dplyr" to group the data and create an index.
- Use
spread
from "tidyr" to transform from a "long" format to a "wide" format.
library(dplyr)
library(tidyr)
combined <- rbind_all(mget(ls(pattern = "^file\\d")))
combined %>%
group_by(V1, V2) %>%
mutate(time = sequence(n())) %>%
ungroup() %>%
spread(time, V3, fill = "")
# Source: local data frame [7 x 5]
#
# V1 V2 1 2 3
# 1 1 7 x
# 2 3 10 b
# 3 4 9 d j
# 4 12 13 a e m
# 5 13 15 b k
# 6 14 17 c c
# 7 24 9 d
Option 2: "data.table"
- Use
mget
to put all of your data.frame
s into a list
.
- Use
rbindlist
to convert that list into a single data.table
.
- Use
sequence(.N)
to generate your sequence by your groups.
- Use
dcast.data.table
to convert the "long" data.table
into a "wide" one.
library(data.table)
dcast.data.table(
rbindlist(mget(ls(pattern = "^file\\d")))[,
time := sequence(.N), by = list(V1, V2)],
V1 + V2 ~ time, value.var = "V3", fill = "")
# V1 V2 1 2 3
# 1: 1 7 x
# 2: 3 10 b
# 3: 4 9 d j
# 4: 12 13 a e m
# 5: 13 15 b k
# 6: 14 17 c c
# 7: 24 9 d
Both of these answers assume we are starting with the following sample data:
file1 <- structure(
list(V1 = c(12L, 13L, 14L, 4L), V2 = c(13L, 15L, 17L, 9L),
V3 = c("a", "b", "c", "d")), .Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -4L))
file2 <- structure(
list(V1 = c(12L, 3L, 14L, 4L), V2 = c(13L, 10L, 17L, 9L),
V3 = c("e", "b", "c", "j")), .Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -4L))
file3 <- structure(
list(V1 = c(12L, 13L, 1L, 24L), V2 = c(13L, 15L, 7L, 9L),
V3 = c("m", "k", "x", "d")), .Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -4L))