Merging a large number of files from one directory into a data frame in R

Question

I have a large number of data files (>1000) in a single directory. I would like to merge them all in a single data frame in R. They all have the same number and types of columns. So far what I have is:

setwd("directory")
files <- list.files()
for (i in 1:length(files)) assign(files[i], read.csv(files[i]))

This creates data frames for each of the 1000+ files. Is there any way to merge them, without having to type out a list of all 1000+ file names?

Any help would be appreciated!

I'd like to know who teaches to use `assign` in R to newbies. DO NOT use `assign`. Just try `do.call(rbind,lapply(files,read.csv))`. — nicola, Jul 25 '15 at 08:29
check [this](http://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list) and [this](http://stackoverflow.com/questions/31480615/data-table-left-outer-join-on-multiple-tables/31481121#31481121) too — Veerendra Gadekar, Jul 25 '15 at 08:29
Do you need to 'merge' or are you appending? And if you are merging, what variable are you merging on? — Jason, Jul 25 '15 at 10:02

MichaelChirico · Answer 1 · 2019-02-27T13:41:39.460

12

The standard way to do this with data.table (recommended because of its speed) is:

library(data.table)
data <- rbindlist(lapply(list.files(), fread))

There are also additional functionalities, e.g.

rbindlist(lapply(list.files(), fread), fill = TRUE)

Will take care of the possibility that some or many of your files have different column names--any non-overlap will be filled with NA in those files lacking that column.

EDIT: as @nicola mentioned, using assign is to be avoided in general unless you really know what you're doing.

See this post for further reference to that end.

edited Feb 27 '19 at 13:41

answered Jul 25 '15 at 15:21

MichaelChirico

33,841
14
113
198

That's a great solution. What if I wanted to bind files adding new columns instead of new lines? Is there an equivalent `cbindlist` ? – rafa.pereira Sep 14 '15 at 15:32
@RafaelPereira that's really a different question and i'm sure that it's been asked. in short it won't be as simple as `cbindlist`, but off the top of my head there are two options: 1) use `rbindlist` and then `dcast` the result; 2) use the `on` option and the `:=` operator to sequentially merge the columns. One point to keep in mind is that a merge is essential for adding columns because we should think of the row as a sacrosanct observation--that is, the values associated with each ID are tied specifically to that ID, i.e. order matters, which is a contrast w.r.t. adding rows. – MichaelChirico Sep 14 '15 at 17:51

score 3 · Answer 2 · edited Nov 09 '17 at 08:54

One good way to do that is to utilize data.table. This library has two benefits that will work in your case: a) it has a fast way of reading .csv files, and b) a fast way of combining data.tables (which are an extension of data.frame) into one. So in this spirit, let me propose the following alternative:

# if you don't have data.table installed, run install.packages('data.table') first
library(data.table)
files <- list.files('directory', full.names = TRUE)
#create a list to manage the individual files, only used to merge them in the end
FILES_LIST=vector("list",length(files)) 
for (i in 1:length(files)) {
    FILES_LIST[[i]]<-fread(files[i]) #this reads your .csv file
}
FILES_LIST = rbindlist(FILES_LIST) #this merges all of your files in a big data.table

The variable you are interested in, in the end is FILES_LIST.

I hope this helps.

Merging a large number of files from one directory into a data frame in R

2 Answers2

Linked