0

I have a directory of directories:

models <- dir("shopperml_pr_points")
> models
 [1] "add_email_subscribers" "custom_domain"         "email_campaign"        "fb_connect"            "gmb"                  
 [6] "holdout"               "ola"                   "ols"                   "post_to_fb"            "sev" 

Within each directory there is another directory of files e.g.

> list.files(paste0("shopperml_pr_points", "/", models[1]))
[1] "add_email_subscribers_task_completed_pr_auc_1547157396.csv" "add_email_subscribers_task_completed_pr_auc_1547157473.csv"
[3] "add_email_subscribers_task_completed_pr_auc_1547157551.csv" "add_email_subscribers_task_completed_pr_auc_1547157631.csv"
[5] "add_email_subscribers_task_completed_pr_auc_1547157712.csv"

I would like to create a list of dataframes, one for each directory within models. So, the first df will be based on directory "add_email_subscribers" and will be the combination of the 5 csv files above.

I wanted to use do.call(rbind, read.table) per this post but since I'm not in the same directory as where the files are actually read from, I'm finding this challenging. I wnet down a path of pasting a long string for each individual csv file but I wondered if there's a more elegant r solution that can already detect the full path of a file such as those within list.files(paste0("shopperml_pr_points", "/", models[1])).

How can I create a list of 9 dataframes based on the directories within models where each directory contains ~5 csv files and those 5 csv files should be collapsed into one dataframe?

Doug Fir
  • 19,971
  • 47
  • 169
  • 299
  • 2
    `list.files(..., full.names = TRUE)` – dww Jan 12 '19 at 01:18
  • Thanks for the tip. I'm trying to get just one sample iteration to run with ``` i = 1 model_dir <- list.files(paste0(dir_start, models[i]), full.names = T) minidf <- do.call(rbind_list(map(model_dir, read.table, stringsAsFactors = F, sep = ",", header = T), idcol = T))``` but get error "Error in do.call(rbind_list(map(model_dir, read.table, stringsAsFactors = F, : argument "args" is missing, with no default". Any ideas? – Doug Fir Jan 12 '19 at 01:25

2 Answers2

1

This should do it. First get the subdirectories subdirs, then for each subdir, read and bind together the files. Then you'll have a list of dfs.

parent_dir <- "shopperml_pr_points"

subdirs <- dir(parent_dir, full.names=TRUE)

df_list <- lapply(subdirs, function(path){
  files <- dir(path, full.names=TRUE, pattern="\\.csv$")
  return(do.call(rbind, lapply(files, read.csv)))
})

If you want to keep track of which rows are from which files within each df, you can add a from_file column to each df. For example like this:

df_list2 <- lapply(subdirs, function(path){
  files <- dir(path, full.names=TRUE, pattern="\\.csv$")
  inner_df_list <- lapply(files, function(fname){
    dat <- read.csv(fname)
    dat$from_file <- fname
    return(dat)
  })
  return(do.call(rbind, inner_df_list))
})
lefft
  • 2,065
  • 13
  • 20
  • Thanks for the answer here. I was trying to add an iterator to the data frames to know which number of file was read in. Tried rbindlist per this post: https://stackoverflow.com/questions/54155700/reading-multiple-csv-files-into-a-single-df-and-adding-a-number-iterator-as-a-co . So final block looks like ```models_dir <- "shopperml_pr_points" models <- dir(parent_dir, full.names=TRUE) df_list <- lapply(models, function(path){ files <- dir(path, full.names = T, pattern="\\.csv$") return(do.call(rbindlist, lapply(files, read.csv), idcol = TRUE)) })``` but throws an unused arg error – Doug Fir Jan 12 '19 at 01:40
  • If you know how to integrate rbind list do let me know? – Doug Fir Jan 12 '19 at 01:40
  • 1
    Updated answer should give what you need, as I understand. If you're talking about `data.table::rbindlist()`, not sure, I don't really work with `data.table`. – lefft Jan 12 '19 at 02:08
1
list.files(path = 'C:/Users/Documents/', all.files = T, full.names = TRUE)
aashish
  • 315
  • 1
  • 7