How to rbind many (+1000) *.rds files fast

Question

I have over 2000 *.rds files that are 18,000 obs and I would like to merge them all into one data.frame. I know it's not efficient to grow objects with for loops and have done so, but is taking really long as it progresses. Is there a faster way to do this?

Here are a couple of things I've tried so far:

lapply:

mergedat <- do.call(rbind, lapply(list.files("dat/"), readRDS))

for loop:

files <- list.files("dat/")
merged <- data.frame()
for (i in unique(files)){
df <- readRDS(paste0("dat/", i))
merged <- rbind(merged, df)
}

In some extremes, quoting the function might be faster (ala `do.call("rbind", ...`). In this case, though, have you tried `dplyr::bind_rows`? In my tests (using small data.frames), it runs almost an order of magnitude faster. — r2evans, Jan 05 '16 at 06:18
@r2evans I have not tried `dplyr` option; I guess I'll give that a go too. — Vedda, Jan 05 '16 at 06:19
@Gregor This was it. It merged everything in less than a minute with `rbind` took over 20. thanks! — Vedda, Jan 05 '16 at 06:52

score 2 · Answer 1 · answered Jan 05 '16 at 06:39

2

It looks like you were close with your first attempt. With a slight modifications, the following worked for me

mergedat <- do.call('rbind', lapply(list.files("dat/", full.names = TRUE), readRDS))

The only thing I added was full.names = TRUE.

answered Jan 05 '16 at 06:39

steveb

5,382
2
27
36

1

OP didn't say it doesn't work. OP said it takes time to finish. – Jan 05 '16 at 06:44
Thanks, but both of those I suggested work. They just take a long time. Any way to speed things up? – Vedda Jan 05 '16 at 06:44
Ok, got it. There was a typo in the original post, I will comment on it there. – steveb Jan 05 '16 at 06:54
2

Perhaps I should delete this answer and Gregor should move the suggestion to an "Answer" to get credit. – steveb Jan 05 '16 at 07:01
whoa this is an old post but you can use, dplyr::bind_rows which is much faster. – Ahdee Sep 11 '21 at 16:01
@Ahdee yes, this is old, I forget about this question. Since this question is considered closed because an answer exists, I don't know if it is worth updating here. In addition to your suggestion, I have been using the following to quickly load many large data files (10s of millions of rows): `my.dt <- data.table::rbindlist(lapply(fnames, function(fn) data.table::fread(fn)))`. This code has run significantly faster than what I have previously done. I believe there may also be other options that are faster. – steveb Sep 11 '21 at 16:48
@steveb yes agree, data.table or the dplyr's bind_rows is so much faster. I was confused because of some weird thing that I needed to stich together >9K worth of dataframe and was surpise that after many hours it was still going. with bin_rows it took less than 30 mins. thanks. – Ahdee Sep 11 '21 at 17:03
also the duplication alert is not correct since this is about speed and not so much about merging multiple tables. – Ahdee Sep 11 '21 at 17:04

How to rbind many (+1000) *.rds files fast

1 Answers1

Linked