Operations in nested loops within variables of filename format in R

Question

The following code runs fine in R,and I got those coding by using 'disp' function in nested loop (SID, session) of matlab to do this trick.

....
S09_06<- read.csv("09_06.csv", header=TRUE,sep=",")
S10_01<- read.csv(file="10_01.csv",header=TRUE, sep=",")
...

So now the coding is combination of R + matlab, which is not so efficient actually.

There should be some ways to make it possible in R?

I've tried coding as follows with little modifications according the possible solution found in the internet several times already,

SID = c(....9, 10,....)
S = 1:6

for (value in SID){
  if (value < 10)
  for (value in S){
  paste0("S0",SID,"_0",S)  = read.csv(file = paste("0",SID,"_0", S, ".csv"), header=TRUE,sep=",")
  }else{
  paste0("S",SID,"_0",S)  = read.csv(file = paste("0",SID,"_0", S, ".csv"), header=TRUE,sep=",")
  }

}

However, the error message below shows every time,

"Error in file(file, "rt") : invalid 'description' argument"

How to make the operation work?

Thank you.

[Don't ever create d1 d2 d3, ..., dn in the first place. Create a list d with n elements.](https://stackoverflow.com/a/24376207/1422451) — Parfait, Sep 15 '19 at 16:21

Parfait · Answer 1 · 2019-09-15T20:29:26.080

0

Consider building a list of data frames by first building a vector of file names and passing them into an lapply or sapply call. Below uses sapply to generate a named (vs. unnamed) data frame list.

# CREATE VECTOR OF FILE NAMES (PASSING TWO VECTORS)
filenames <- as.vector(sapply(SID, 
                              function(x,y) paste0("S", ifelse(x < 10, 
                                                               paste0("0", x),
                                                               paste(x)), 
                                                   "_0", y),
                              S)
                      )

# CREATE NAMED LIST OF DATA FRAMES (PASSING ONE VECTOR) 
df_list <- sapply(filenames, function(i) {
                     fname <- paste0(substr(i, 2, nchar(i)),".csv")
                     read.csv(fname, header=TRUE, sep=",")
                  }, simplify=FALSE)

# ACCESS INDIVIDUAL DATA FRAMES    
df_list$S09_01  
df_list$S09_02
df_list$S09_03
...

edited Sep 15 '19 at 20:29

answered Sep 15 '19 at 16:41

Parfait

104,375
17
94
125

Thank you @Parfait, the df_list did make as a nested list with SID and S as mentioned. However, the content inside each .csv couldn't be read....all become "A data.frame with 0 rows and 1 column" in Value column when viewing df_list. and "No data available in table" when viewing into df_list$SID_S. There should be a way to deal with nested loop puzzle and do some easy operation like read.csv here in R. Let's try and keep some faith in R still : ) – chew Sep 15 '19 at 17:44
I updated code passing your original args of *header* and *sep* which I thought was redundant being defaults but may differ in your language/region settings. – Parfait Sep 15 '19 at 20:33
Thank you, @Parfait. After restarting R, your codes is working. Now I can move forward to further steps like select necessary columns, replace some values and so on that dealt with R+ Matlab before : ) – chew Sep 16 '19 at 16:00
Yes, I pressed the up icon for several times and it shows "Thanks for the feedback! Votes cast by those with less than 15 reputation are recorded, but do not change the publicly displayed post score." Sorry for that I have 11 only now...But I do appreciate your nice codes for me to retrieve some confidence in R : ) Have a nice day – chew Sep 17 '19 at 08:34

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

This question touches several issues:

How to read a large number of csv files in one go,
How to create a vector of filenames from 2 integer input vectors in the proper format.

Item 1. has been asked and answered many times before and is also the core of Parfait's answer.

This answer focuses on Item 2.

As an added complexity, the filenames are following the scheme "09_06.csv", "10_01.csv" (including the file extension) while the resulting data.frames are to be named "S09_06", "S10_01" (with a leading "S" but without the file extension).

The creation of basenames (without prefix and file extension) in the proper format can be simplified by using outer() and sprintf():

SID = c(9, 10)
S = 1:6
outer(SID, S, sprintf, fmt = "%02i_%02i")

     [,1]    [,2]    [,3]    [,4]    [,5]    [,6]   
[1,] "09_01" "09_02" "09_03" "09_04" "09_05" "09_06"
[2,] "10_01" "10_02" "10_03" "10_04" "10_05" "10_06"

The conversion specifier %02i denotes a field width of 2 characters and that the output is to be padded with leading 0s.

Now, the list of data.frames can be created by 3 lines of code:

basenames <- outer(SID, S, sprintf, fmt = "%02i_%02i")
df_list <- lapply(paste0(basenames, ".csv"), read.csv, header = TRUE, sep = ",")
names(df_list) <- paste0("S", basenames)

Just for demonstration and in order to avoid to create many csv files beforehand, the print() function is used instead of read.csv():

basenames <- outer(SID, S, sprintf, fmt = "%02i_%02i")
df_list <- lapply(paste0(basenames, ".csv"), print) # just for demonstration
names(df_list) <- paste0("S", basenames)
df_list

$S09_01
[1] "09_01.csv"

$S10_01
[1] "10_01.csv"

$S09_02
[1] "09_02.csv"

$S10_02
[1] "10_02.csv"

$S09_03
[1] "09_03.csv"

$S10_03
[1] "10_03.csv"

$S09_04
[1] "09_04.csv"

$S10_04
[1] "10_04.csv"

$S09_05
[1] "09_05.csv"

$S10_05
[1] "10_05.csv"

$S09_06
[1] "09_06.csv"

$S10_06
[1] "10_06.csv"

Create one data.frame

The OP has mentioned that he wants to "select necessary columns, replace some values". This sounds as if all files have an identical structure, i.e., the same number, order, names, and type of columns.

If all files do have the same structure, I would combine them in one large data.frame. This is easier to handle than to apply all operations on a list of data.frames.

This is what I would do with my preferred tools:

library(data.table)
library(magrittr)
SID = c(9, 10)
S = 1:6
filenames <-CJ(SID, S)[, sprintf("%02i_%02i.csv", SID, S)]
lapply(filenames, fread) %>% 
  set_names(filenames) %>% 
  rbindlist(idcol = "file")

         file V1 V2          V3
 1: 09_01.csv  Y 39 -0.83562861
 2: 09_01.csv  D  1  1.59528080
 3: 09_02.csv  V 74  1.51178117
 4: 09_02.csv  N  7  0.38984324
 5: 09_03.csv  O 84  0.59390132
 6: 09_03.csv  A 35  0.91897737
 7: 09_04.csv  F 40 -1.47075238
 8: 09_04.csv  Y 44 -0.47815006
 9: 09_05.csv  B 18 -0.41499456
10: 09_05.csv  M 22 -0.39428995
11: 09_06.csv  G 81 -1.16657055
12: 09_06.csv  K 13 -1.06559058
13: 10_01.csv  N 59  0.48742905
14: 10_01.csv  R 51  0.73832471
15: 10_02.csv  I 37 -0.04493361
16: 10_02.csv  Y 34 -0.01619026
17: 10_03.csv  O 28 -1.98935170
18: 10_03.csv  T 20  0.61982575
19: 10_04.csv  Z 51 -0.10278773
20: 10_04.csv  G 42  0.38767161
21: 10_05.csv  S 70  0.76317575
22: 10_05.csv  H 87 -0.16452360
23: 10_06.csv  W 84 -0.11234621
24: 10_06.csv  N 29  0.88110773
         file V1 V2          V3

Note that the first column contains the filename from which the row originated.

Data

Sample files were created by

library(data.table)
library(magrittr)
SID = c(9, 10)
S = 1:6
fn <- outer(SID, S, sprintf, fmt = "%02i_%02i.csv")
set.seed(1L)
nr = 2L
dfl <- replicate(
  length(SID)*length(S), 
  data.frame(V1 = sample(LETTERS, nr), V2 = sample.int(100, nr), V3 = rnorm(nr)),
  simplify = FALSE
  ) %>% 
  set_names(fn) 
lapply(fn, function(x) fwrite(dfl[[x]], file = x))

Thank you, @Uwe. Your explanation is easy to understand and same result got with your codes for the .csv namelist(SID_S). However, the content of each .csv couldn't be read when running 'df_list <- lapply(paste0(basenames, ".csv"), read.csv, header = TRUE, sep = ",")'. Error message : "Error in file(file, "rt") : cannot open the connection". df_list@Parfait could make it : ) — chew, Sep 16 '19 at 16:19
That was caused because I had overlooked that your filenames are *not* prefixed by `"S"`. I have corrected this. — Uwe, Sep 17 '19 at 06:23
Thank you, @Uwa. We are the guys who always want to make coding working : ) However, my data format is UTF-16, which is not supported by fread. Endless nested loops could be easily made by matlab effortlessly and apparently it's not the case in R... Thank you again for noticing what steps to take for the next. Have a nice day : ) — chew, Sep 17 '19 at 09:15
Oh, I see. UTF-16 seems to have been an issue (https://github.com/Rdatatable/data.table/issues/2560) indeed but according to https://github.com/Rdatatable/data.table/issues/2435 the development version 1.12.3 of data.table has fixed the issue, IIUC. — Uwe, Sep 17 '19 at 10:07

score 0 · Accepted Answer · answered Sep 17 '19 at 12:27

Thank you Both , @ Parfait @ Uwe

Because there are hundreds of steps behind read.csv, and the difficulty of nested loops in R, drove me to turn to matlab for making these tricks work instead.

Although I got the results I want already, but it's like a semi-automatic coding. Every step related to loops needs to modify in matlab again when adding some new participants and put it back in R to run.

There should be some ways to manipulate all the codes in R sorely, to make the more than 100 thousands lines in R less and efficiently.

Every value needs to be calculated individually under different conditions before running ANOVA or Ancova. Or do you think it's better to be dealt on list basis?

For example, like missing or error rate of each participants based on the logical judgment of combination from different columns?

I can only make it work with R + matlab, but there should be some ways to deal with R solely.

That's why I started this question and Hope there will be some solutions.

Any comments will be appreciated.

Thank you and Have a nice day : )

Because the hundreds of nested loops that couldn't be coded in R, need to turn to matlab coding solely to make "real automatic" coding, instead of "semi-automatic" coding : ) Need to say goodbye to R now. However, R plotting is still excellent than others : ) — chew, Sep 18 '19 at 16:11

Operations in nested loops within variables of filename format in R

3 Answers3

Create one data.frame

Data