2

How can I import multiple CSV files (15000 CSV files) in R and delete certain columns in each file and save it?

aborruso
  • 4,938
  • 3
  • 23
  • 40
  • All the CSV files have the same headings – Amin Hosseini Sep 02 '20 at 17:24
  • Does this answer your question? [How to import multiple .csv files at once?](https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once) – Phil Sep 02 '20 at 17:50
  • The best code that I found is as follow but every time I receive the same error that my column doesn't exist library("purrr") library("dplyr") file_list <- list.files(pattern = "\\.csv$") walk(file_list, function(f) { file_name <- paste0("filtered_", basename(f)) read.table(f, header=TRUE, sep="\t", stringsAsFactors=FALSE) %>% select("-Event") %>% write.table(file_name, sep="\t", col.names=TRUE, row.names=FALSE, quote=FALSE) }) – Amin Hosseini Sep 02 '20 at 17:50

1 Answers1

1

You could automatize a read-manipulate-write approach.

Say, your .csvs are in your working directory.

ldir <- getwd()    

Store their names in a vector.

## list names
nm <- paste0(ldir, dir(pattern="tab\\d"))  ## list names
nm <- paste0(ldir, c("tab1.csv", "tab2.csv", "tab3.csv", "tab4.csv"))  ## or hard coded

First let's inspect heads of our example .csvs.

## inspect head of each .csv
lapply(nm, function(i) head(read.csv(i), 3))
# [[1]]
#           X1        X2         X3        X4
# 1  1.3709584  1.304870 -0.3066386 0.4554501
# 2 -0.5646982  2.286645 -1.7813084 0.7048373
# 3  0.3631284 -1.388861 -0.1719174 1.0351035
# 
# [[2]]
#           X1         X2         X3          X4
# 1  0.2059986  0.3219253 -0.3672346 -1.04311894
# 2 -0.3610573 -0.7838389  0.1852306 -0.09018639
# 3  0.7581632  1.5757275  0.5818237  0.62351816
# 
# [[3]]
#           X1         X2        X3          X4
# 1 1.51270701  1.3921164  1.200965 -0.02509255
# 2 0.25792144 -0.4761739  1.044751  0.10807273
# 3 0.08844023  0.6503486 -1.003209 -0.48543524
# 
# [[4]]
#           X1            X2         X3          X4
# 1 -1.4936251  5.676206e-01 -0.0861073 -0.04069848
# 2 -1.4704357 -4.928774e-01 -0.8876790 -1.55154482
# 3  0.1247024  6.288407e-05 -0.4446840  1.16716955

Then, put names of columns to delete in another vector.

## select columns to delete
to.del <- c("X2", "X4")

Finally, we use mapply to multivariately loop through both vectors. In FUN, there are two versions of write.csv, one that overwrites the original .csvs, and one that makes a copy.

## read, delete, and write back
mapply(function(x, y) {
  r <- read.csv(x)
  # write.csv(r[, !names(r) %in% y], row.names=FALSE, file=x)  ## use to override
  write.csv(r[, !names(r) %in% y], row.names=FALSE, 
              file=paste0(dirname(x), "copy_of_", basename(x)))  ## use for copy
  }, nm, list(to.del))

Let's check, if everything worked as expected.

## check result
lapply(paste0(dirname(nm), "copy_of_", basename(nm)), function(i) 
  head(read.csv(i), 3))
# [[1]]
#           X1         X3
# 1  1.3709584 -0.3066386
# 2 -0.5646982 -1.7813084
# 3  0.3631284 -0.1719174
# 
# [[2]]
#           X1         X3
# 1  0.2059986 -0.3672346
# 2 -0.3610573  0.1852306
# 3  0.7581632  0.5818237
# 
# [[3]]
#           X1        X3
# 1 1.51270701  1.200965
# 2 0.25792144  1.044751
# 3 0.08844023 -1.003209
# 
# [[4]]
#           X1         X3
# 1 -1.4936251 -0.0861073
# 2 -1.4704357 -0.8876790
# 3  0.1247024 -0.4446840

Yap.


Example data:

## generate example .csvs in wd
set.seed(42)
sapply(1:4, function(i) write.csv(data.frame(matrix(rnorm(10*4), 10, 4)), 
                                  row.names=FALSE, file=paste0("tab", i, ".csv")))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • When I try to run the lapply part, I receive the following error Error during wrapup: unimplemented type (29) in 'eval' Error: no more error handlers available (recursive errors?); invoking 'abort' restart Error during wrapup: INTEGER() can only be applied to a 'integer', not a 'unknown type #29' Error: no more error handlers available (recursive errors?); invoking 'abort' restart Browse[1]> – Amin Hosseini Sep 02 '20 at 20:02
  • Error during wrapup: cannot open the connection – Amin Hosseini Sep 02 '20 at 20:25
  • @AminHosseini Code is working for me. Could you provide `sessionInfo()`? – jay.sf Sep 02 '20 at 21:13
  • here it is: R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 15063) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 tools_4.0.2 – Amin Hosseini Sep 02 '20 at 23:39
  • @AminHosseini Looks like a fresh R session to me. I'm also running windows, win7 though, but that shouldn't make any difference. I corrected a minor issue in "## check result" `basename(nm)`, but I beleive that wasn't the problem. Sorry, I'm afraid I don't know whats going on that your connection can't be opened. :/ – jay.sf Sep 03 '20 at 06:27