This question touches several issues:
- How to read a large number of csv files in one go,
- How to create a vector of filenames from 2 integer input vectors in the proper format.
Item 1. has been asked and answered many times before and is also the core of Parfait's answer.
This answer focuses on Item 2.
As an added complexity, the filenames are following the scheme "09_06.csv"
, "10_01.csv"
(including the file extension) while the resulting data.frames are to be named "S09_06"
, "S10_01"
(with a leading "S"
but without the file extension).
The creation of basenames (without prefix and file extension) in the proper format can be simplified by using outer()
and sprintf()
:
SID = c(9, 10)
S = 1:6
outer(SID, S, sprintf, fmt = "%02i_%02i")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "09_01" "09_02" "09_03" "09_04" "09_05" "09_06"
[2,] "10_01" "10_02" "10_03" "10_04" "10_05" "10_06"
The conversion specifier %02i
denotes a field width of 2 characters and that the output is to be padded with leading 0
s.
Now, the list of data.frames can be created by 3 lines of code:
basenames <- outer(SID, S, sprintf, fmt = "%02i_%02i")
df_list <- lapply(paste0(basenames, ".csv"), read.csv, header = TRUE, sep = ",")
names(df_list) <- paste0("S", basenames)
Just for demonstration and in order to avoid to create many csv files beforehand, the print()
function is used instead of read.csv()
:
basenames <- outer(SID, S, sprintf, fmt = "%02i_%02i")
df_list <- lapply(paste0(basenames, ".csv"), print) # just for demonstration
names(df_list) <- paste0("S", basenames)
df_list
$S09_01
[1] "09_01.csv"
$S10_01
[1] "10_01.csv"
$S09_02
[1] "09_02.csv"
$S10_02
[1] "10_02.csv"
$S09_03
[1] "09_03.csv"
$S10_03
[1] "10_03.csv"
$S09_04
[1] "09_04.csv"
$S10_04
[1] "10_04.csv"
$S09_05
[1] "09_05.csv"
$S10_05
[1] "10_05.csv"
$S09_06
[1] "09_06.csv"
$S10_06
[1] "10_06.csv"
Create one data.frame
The OP has mentioned that he wants to "select necessary columns, replace some values". This sounds as if all files have an identical structure, i.e., the same number, order, names, and type of columns.
If all files do have the same structure, I would combine them in one large data.frame. This is easier to handle than to apply all operations on a list of data.frames.
This is what I would do with my preferred tools:
library(data.table)
library(magrittr)
SID = c(9, 10)
S = 1:6
filenames <-CJ(SID, S)[, sprintf("%02i_%02i.csv", SID, S)]
lapply(filenames, fread) %>%
set_names(filenames) %>%
rbindlist(idcol = "file")
file V1 V2 V3
1: 09_01.csv Y 39 -0.83562861
2: 09_01.csv D 1 1.59528080
3: 09_02.csv V 74 1.51178117
4: 09_02.csv N 7 0.38984324
5: 09_03.csv O 84 0.59390132
6: 09_03.csv A 35 0.91897737
7: 09_04.csv F 40 -1.47075238
8: 09_04.csv Y 44 -0.47815006
9: 09_05.csv B 18 -0.41499456
10: 09_05.csv M 22 -0.39428995
11: 09_06.csv G 81 -1.16657055
12: 09_06.csv K 13 -1.06559058
13: 10_01.csv N 59 0.48742905
14: 10_01.csv R 51 0.73832471
15: 10_02.csv I 37 -0.04493361
16: 10_02.csv Y 34 -0.01619026
17: 10_03.csv O 28 -1.98935170
18: 10_03.csv T 20 0.61982575
19: 10_04.csv Z 51 -0.10278773
20: 10_04.csv G 42 0.38767161
21: 10_05.csv S 70 0.76317575
22: 10_05.csv H 87 -0.16452360
23: 10_06.csv W 84 -0.11234621
24: 10_06.csv N 29 0.88110773
file V1 V2 V3
Note that the first column contains the filename from which the row originated.
Data
Sample files were created by
library(data.table)
library(magrittr)
SID = c(9, 10)
S = 1:6
fn <- outer(SID, S, sprintf, fmt = "%02i_%02i.csv")
set.seed(1L)
nr = 2L
dfl <- replicate(
length(SID)*length(S),
data.frame(V1 = sample(LETTERS, nr), V2 = sample.int(100, nr), V3 = rnorm(nr)),
simplify = FALSE
) %>%
set_names(fn)
lapply(fn, function(x) fwrite(dfl[[x]], file = x))