simplifying an arbitrarily long list of data frames into a single data frame

Question

I have a directory of identically structured csv files. I'm trying to load all of them into a single data.frame. Currently I use lapply() with read.csv() to get a list of data.frames and I am looking for an elegant way to convert this list to a data.frame that avoids an explicit loop.

The result of my lapply(list.of.file.names,read.csv) can be approximated as this structure:

list.of.dfs <- list(data.frame(A=sample(seq(from = 1, to = 10), size = 5),
                               B=sample(seq(from = 1, to = 10), size = 5)), 
                    data.frame(A=sample(seq(from = 1, to = 10), size = 5),
                               B=sample(seq(from = 1, to = 10), size = 5)), 
                    data.frame(A=sample(seq(from = 1, to = 10), size = 5),
                               B=sample(seq(from = 1, to = 10), size = 5))
                    )

What is an elegant version of the following line that works for arbitrary length lists:

one.data.frame <- rbind(list.of.dfs[[1]],list.of.dfs[[2]],list.of.dfs[[3]])

I can do this with a for loop, but is there a vector-based solution?

Yep, this one's surprisingly easy once you know what `do.call` does. — Marius, Jul 26 '13 at 00:38
Yup, this is a duplicate. Somehow I didn't find the earlier one. Sheepishly voting to close my own question. — MattBagg, Jul 26 '13 at 02:10

score 5 · Answer 1 · edited May 23 '17 at 11:43

5

do.call is the basic way of doing this.

do.call(rbind, list.of.dfs)

But it can be slow if you have a lot of data items, and other discussions here on S.O. have centred on how to speed things up by using custom functions or the data.table or plyr packages. E.g.:

Why is rbindlist "better" than rbind?

Can rbind be parallelized in R?

Performance of rbind.data.frame

edited May 23 '17 at 11:43

Community

1
1

answered Jul 26 '13 at 00:40

thelatemail

91,185
12
128
188

score 4 · Answer 2 · answered Jul 26 '13 at 00:48

4

@thelatemail alluded to it, but you might want to use the following for speed:

rbindlist(list.of.dfs)

(requires library(data.table) )

answered Jul 26 '13 at 00:48

Ricardo Saporta

54,400
17
144
178

simplifying an arbitrarily long list of data frames into a single data frame

2 Answers2