0

I want to read multiple .csv files from differents directories then put it in a single dataframe.

I have two kinds of directories to read:

A:/LogIIS/FOLDER01/"files.csv"


On others there a folder with several files.csv, as the example bellow:

A:/LogIIS/FOLDER02/FOLDER_A/"files.csv

"A:/LogIIS/FOLDER02/FOLDER_B/"files.csv"

"A:/LogIIS/FOLDER02/FOLDER_C/"files.csv"


"A:/LogIIS/FOLDER03/FOLDER_A/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_B/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_C/"files.csv"

"A:/LogIIS/FOLDER03/FOLDER_D/"files.csv"


Thanks in advance!

Helio Roots
  • 177
  • 2
  • 10

2 Answers2

3

If you need to explicitly define a file pattern (a file name, or extension), you can use the pattern parameter in the list.files function.

library(data.table)

# make an explicit alist of folders
folders = list(
  file.path('A:','LogIIS','FOLDER02','FOLDER_A'),
  file.path('A:','LogIIS','FOLDER02','FOLDER_B'),
  file.path('A:','LogIIS','FOLDER02','FOLDER_C'),
  file.path('A:','LogIIS','FOLDER03','FOLDER_A'),
  file.path('A:','LogIIS','FOLDER03','FOLDER_B'),
  file.path('A:','LogIIS','FOLDER03','FOLDER_C'),
  file.path('A:','LogIIS','FOLDER03','FOLDER_D')
)

# iterate through each folder in list and return all files
# unlist those lists of files into a single vector
files = unlist(sapply(folders, function(folder) {
  list.files(folder, full.names=TRUE)
}))

# read each file into a data.table
# return data.table results as a list
# combine list into a single data.table
rbindlist(use.names=TRUE, fill=FALSE,
  lapply(files, function(x) { 
    fread(x)  
  }) 
)
Mikuana
  • 584
  • 5
  • 12
  • @Mikuma, the code is returning " Error in fread(x) : Expected sep (' ') but new line, EOF (or other non printing character) ends field 5 when detecting types from point 0: #Software: Microsoft Internet Information Services 8.5 " – Helio Roots May 11 '17 at 18:25
  • @helio7sr it looks like you have some irregular characters in your CSV files. You may need to either look into cleansing your CSV files, or changing the way that `fread` interprets them. Try the command `?data.table::fread` to read more about the parameters for interpreting your source files. – Mikuana May 12 '17 at 11:54
0

I would also use the the list.files() function with a loop to extract all information. list all directories under the common top level directory in this case the directory A:/LogIIS

common_path = "A:/LogIIS/"
primary_dirs = list.files(common_path);
primary_dirs 
[1] "FOLDER01" "FOLDER02" "FOLDER03"

now I would do a nested loop over all primary_dirs, in your example all the .csv files have a common name files.csv which simplifies the problem, you also haven't said how to append the csv files but I will assume they have the same column headers and will use cbind() to append them, otherwise you could use rbind()

main_data = data.frame(##populate heade) ## 

using the answer from here

for(dir in primary_dirs) {
  sub_folders = list.files(paste(common_path,dir,sep = ""))
  if (any(sub_folders %in% "files.csv")) {
    ## there is files.csv in this directory read it in and append to a data.frame.
    ## read in data 
    temp_data = read.csv(file = paste(common_path,dir,"/files.csv",sep = ""))
    ## append
    main_data = cbind(main_data,temp_data);
  } else {
    ## try go one more directory deeper
    for(sub_dir in sub_folders) {
      sub_sub_files = list.files(paste(common_path,dir,"/",sub_dir,sep = ""))             
      if (any(sub_sub_files %in% "files.csv")) {
        ## found files.csv read it in and append it
        temp_data = read.csv(file = paste(common_path,dir,"/",sub_dir,"/files.csv",sep = ""))
        main_data = cbind(main_data,temp_data);
      } else {
        warning("could not find the file 'files.csv' two directories deep")
      }
    } 
  }
}
Community
  • 1
  • 1
Cyrillm_44
  • 701
  • 3
  • 17