Combining multiple csv files together in an r loop

Question

I have a folder with many csv files in it. They are all structured as per picture

I am interested to count the numbers under my variable "Type" and get an output that tells me that there are two number 7, two number 9, two 1 and so on. I want to do this for the csv files in my folder, and it would be great to bind the outputs from different files together (with an identifier to the original file the output was extracted from). So far, I managed to do it for individual files with this code:

mydata <- read.csv("1_data.csv", skip=1, header = T)
df <- data.frame(table(mydata$Type))

However, I tried to code a loop and got stucked. This is the code I am using:

files = list.files(pattern = "*.csv")

for (i in files) {
  id <- substr(i, 1, 5)
  mydata <- read.csv (i, skip=1, header = T)
  datatobind <- data.frame(table(mydata$Type))
  datatobind["id"] <- as.numeric(id)
  data <- rbind(data, datatobind)
}
do.call (rbind, data)

write.csv(data, file='final.csv', row.names=FALSE)

I get a different error every time I try to change the code, so I am not sure how to fix this.

try reading the files to a list first, using something like `L <- lapply( files, data.table::fread, skip = 1 )`, and then rowbind the list using `DT <- data.table::rbindlist(L, use.names = TRUE, fill = TRUE )` — Wimpel, Sep 09 '20 at 09:27
https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once — jogo, Sep 09 '20 at 09:29
Hi both, thanks for your reply. Unfortunately, I am interested in merging the outputs together (so after I counted the numbers under my "Type" variable). This is because I am trying to identify errors and I need to see in which file the errors are. The binded output CSV file I have in mind should contain a header with the file name or a blank row to make it clear. - but this would be after I figure out how to do the loop — PsychSilvia, Sep 09 '20 at 10:09

score 0 · Accepted Answer · answered Sep 10 '20 at 07:54

Here are couple of ways to do count Type column from each file, add a new column with the filename and bind the output together.

Using base R :

files = list.files(pattern = "*.csv", full.names = TRUE)

new_data <- do.call(rbind, lapply(files, function(x) {
                  mydata <- read.csv(x, skip=1, header = TRUE)
                  transform(as.data.frame(table(mydata$Type)), 
                            filename = basename(x))
            }))

and with tidyverse :

library(dplyr)

new_data <- purrr::map_df(files, function(x) {
  mydata <- read.csv(x, skip=1, header = TRUE)
  mydata %>%
    count(Type) %>%
    mutate(filename = basename(x))
})

score 0 · Answer 2 · answered Feb 26 '21 at 10:18

Here is a parallel version that suits your needs. You may need to install doSNOW and parallel packages:

library(doSNOW)
library(parallel)

setwd("path/to/folder")

all_files = list.files(pattern = "\\.csv$")
num_files = length(all_files)

cl <- makeCluster(min(num_files, floor(detectCores()*0.9)), outfile = "")
registerDoSNOW(cl)
dataset <- foreach(i=1:num_files, .combine='rbind') %dopar% 
{
  read.csv(all_files[i], header=TRUE)
}
stopCluster(cl)
registerDoSEQ()

write.csv(dataset, file='final.csv', row.names=FALSE)

Tested on Windows 10 x64, with massive speedup vs regular loop.

Combining multiple csv files together in an r loop

2 Answers2