0

I have a directory of identically structured csv files. I'm trying to load all of them into a single data.frame. Currently I use lapply() with read.csv() to get a list of data.frames and I am looking for an elegant way to convert this list to a data.frame that avoids an explicit loop.

The result of my lapply(list.of.file.names,read.csv) can be approximated as this structure:

list.of.dfs <- list(data.frame(A=sample(seq(from = 1, to = 10), size = 5),
                               B=sample(seq(from = 1, to = 10), size = 5)), 
                    data.frame(A=sample(seq(from = 1, to = 10), size = 5),
                               B=sample(seq(from = 1, to = 10), size = 5)), 
                    data.frame(A=sample(seq(from = 1, to = 10), size = 5),
                               B=sample(seq(from = 1, to = 10), size = 5))
                    )

What is an elegant version of the following line that works for arbitrary length lists:

one.data.frame <- rbind(list.of.dfs[[1]],list.of.dfs[[2]],list.of.dfs[[3]])

I can do this with a for loop, but is there a vector-based solution?

MattBagg
  • 10,268
  • 3
  • 40
  • 47

2 Answers2

5

do.call is the basic way of doing this.

do.call(rbind, list.of.dfs)

But it can be slow if you have a lot of data items, and other discussions here on S.O. have centred on how to speed things up by using custom functions or the data.table or plyr packages. E.g.:

Why is rbindlist "better" than rbind?

Can rbind be parallelized in R?

Performance of rbind.data.frame

Community
  • 1
  • 1
thelatemail
  • 91,185
  • 12
  • 128
  • 188
4

@thelatemail alluded to it, but you might want to use the following for speed:

rbindlist(list.of.dfs)

(requires library(data.table) )

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178