1

I have roughly 50000 .rda files. Each contains a dataframe named results with exactly one row. I would like to append them all into one dataframe.

I tried the following, which works, but is slow:

root_dir <- paste(path, "models/", sep="")
files <- paste(root_dir, list.files(root_dir), sep="")
load(files[1])
results_table = results
rm(results)

for(i in c(2:length(files))) {
  print(paste("We are at step ", i,sep=""))
  load(files[i])
  results_table= bind_rows(list(results_table, results))
  rm(results)
}

Is there a more efficient way to do this?

safex
  • 2,398
  • 17
  • 40
  • 2
    possibly duplicate? https://stackoverflow.com/a/34711970/3154189 – Marcel Gangwisch Jan 29 '20 at 12:01
  • If you save each file inside a list then unnest() it afterwards, I believe you won't have to rewrite all files in your memory each step. What's slowing down is the consecutive binding of rows. – André Oliveira Jan 29 '20 at 12:28
  • Does this answer your question? [Combine multiple .RData files containing objects with the same name into one single .RData file](https://stackoverflow.com/questions/14757668/combine-multiple-rdata-files-containing-objects-with-the-same-name-into-one-sin) – rafa.pereira Jan 29 '20 at 14:41

2 Answers2

1

Using .rds is a little bit easier. But if we are limited to .rda the following might be useful. I'm not certain if this is faster than what you have done:

library(purrr)
library(dplyr)
library(tidyr)

## make and write some sample data to .rda
x <- 1:10

fake_files <- function(x){
  df <- tibble(x = x)
  save(df, file = here::here(paste0(as.character(x),
                                    ".rda")))
  return(NULL)
}

purrr::map(x,
           ~fake_files(x = .x))

## map and load the .rda files into a single tibble

load_rda <- function(file) {
  foo <- load(file = file) # foo just provides the name of the objects loaded
  return(df) # note df is the name of the rda returned object
}

rda_files <- tibble(files = list.files(path = here::here(""),
                                pattern = "*.rda",
                                full.names = TRUE)) %>%
  mutate(data = pmap(., ~load_rda(file = .x))) %>%
  unnest(data)

mpschramm
  • 520
  • 6
  • 12
0

This is untested code but should be pretty efficient:

root_dir <- paste(path, "models/", sep="")
files <- paste(root_dir, list.files(root_dir), sep="")

data_list <- lapply("mydata.rda", function(f) {
  message("loading file: ", f)
  name <- load(f)                    # this should capture the name of the loaded object
  return(eval(parse(text = name)))   # returns the object with the name saved in `name`
})

results_table <- data.table::rbindlist(data_list)

data.table::rbindlist is very similar to dplyr::bind_rows but a little faster.

JBGruber
  • 11,727
  • 1
  • 23
  • 45