How can I import multiple CSV files (15000 CSV files) in R and delete certain columns in each file and save it?
Asked
Active
Viewed 1,096 times
2
-
All the CSV files have the same headings – Amin Hosseini Sep 02 '20 at 17:24
-
Does this answer your question? [How to import multiple .csv files at once?](https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once) – Phil Sep 02 '20 at 17:50
-
The best code that I found is as follow but every time I receive the same error that my column doesn't exist library("purrr") library("dplyr") file_list <- list.files(pattern = "\\.csv$") walk(file_list, function(f) { file_name <- paste0("filtered_", basename(f)) read.table(f, header=TRUE, sep="\t", stringsAsFactors=FALSE) %>% select("-Event") %>% write.table(file_name, sep="\t", col.names=TRUE, row.names=FALSE, quote=FALSE) }) – Amin Hosseini Sep 02 '20 at 17:50
1 Answers
1
You could automatize a read-manipulate-write approach.
Say, your .csvs are in your working directory.
ldir <- getwd()
Store their names in a vector.
## list names
nm <- paste0(ldir, dir(pattern="tab\\d")) ## list names
nm <- paste0(ldir, c("tab1.csv", "tab2.csv", "tab3.csv", "tab4.csv")) ## or hard coded
First let's inspect heads of our example .csvs.
## inspect head of each .csv
lapply(nm, function(i) head(read.csv(i), 3))
# [[1]]
# X1 X2 X3 X4
# 1 1.3709584 1.304870 -0.3066386 0.4554501
# 2 -0.5646982 2.286645 -1.7813084 0.7048373
# 3 0.3631284 -1.388861 -0.1719174 1.0351035
#
# [[2]]
# X1 X2 X3 X4
# 1 0.2059986 0.3219253 -0.3672346 -1.04311894
# 2 -0.3610573 -0.7838389 0.1852306 -0.09018639
# 3 0.7581632 1.5757275 0.5818237 0.62351816
#
# [[3]]
# X1 X2 X3 X4
# 1 1.51270701 1.3921164 1.200965 -0.02509255
# 2 0.25792144 -0.4761739 1.044751 0.10807273
# 3 0.08844023 0.6503486 -1.003209 -0.48543524
#
# [[4]]
# X1 X2 X3 X4
# 1 -1.4936251 5.676206e-01 -0.0861073 -0.04069848
# 2 -1.4704357 -4.928774e-01 -0.8876790 -1.55154482
# 3 0.1247024 6.288407e-05 -0.4446840 1.16716955
Then, put names of columns to delete in another vector.
## select columns to delete
to.del <- c("X2", "X4")
Finally, we use mapply
to multivariately loop through both vectors. In FUN, there are two versions of write.csv
, one that overwrites the original .csvs, and one that makes a copy.
## read, delete, and write back
mapply(function(x, y) {
r <- read.csv(x)
# write.csv(r[, !names(r) %in% y], row.names=FALSE, file=x) ## use to override
write.csv(r[, !names(r) %in% y], row.names=FALSE,
file=paste0(dirname(x), "copy_of_", basename(x))) ## use for copy
}, nm, list(to.del))
Let's check, if everything worked as expected.
## check result
lapply(paste0(dirname(nm), "copy_of_", basename(nm)), function(i)
head(read.csv(i), 3))
# [[1]]
# X1 X3
# 1 1.3709584 -0.3066386
# 2 -0.5646982 -1.7813084
# 3 0.3631284 -0.1719174
#
# [[2]]
# X1 X3
# 1 0.2059986 -0.3672346
# 2 -0.3610573 0.1852306
# 3 0.7581632 0.5818237
#
# [[3]]
# X1 X3
# 1 1.51270701 1.200965
# 2 0.25792144 1.044751
# 3 0.08844023 -1.003209
#
# [[4]]
# X1 X3
# 1 -1.4936251 -0.0861073
# 2 -1.4704357 -0.8876790
# 3 0.1247024 -0.4446840
Yap.
Example data:
## generate example .csvs in wd
set.seed(42)
sapply(1:4, function(i) write.csv(data.frame(matrix(rnorm(10*4), 10, 4)),
row.names=FALSE, file=paste0("tab", i, ".csv")))

jay.sf
- 60,139
- 8
- 53
- 110
-
When I try to run the lapply part, I receive the following error Error during wrapup: unimplemented type (29) in 'eval' Error: no more error handlers available (recursive errors?); invoking 'abort' restart Error during wrapup: INTEGER() can only be applied to a 'integer', not a 'unknown type #29' Error: no more error handlers available (recursive errors?); invoking 'abort' restart Browse[1]> – Amin Hosseini Sep 02 '20 at 20:02
-
-
@AminHosseini Code is working for me. Could you provide `sessionInfo()`? – jay.sf Sep 02 '20 at 21:13
-
here it is: R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 15063) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 tools_4.0.2 – Amin Hosseini Sep 02 '20 at 23:39
-
@AminHosseini Looks like a fresh R session to me. I'm also running windows, win7 though, but that shouldn't make any difference. I corrected a minor issue in "## check result" `basename(nm)`, but I beleive that wasn't the problem. Sorry, I'm afraid I don't know whats going on that your connection can't be opened. :/ – jay.sf Sep 03 '20 at 06:27