3

I am very new to R and am trying to use it for generating a very large number of subsets (>4000), which my current script in JSL cannot handle. i would greatly appreciate if anyone could give me pointers on going about subsetting a table like the one below and saving them as .csv using column name and row info.

set.seed(24)
DT = data.frame(Group=c("A","A","A","A","B","B","B","B","B"),Output1 = rnorm(9),Output2=rnorm(9),Output3=rnorm(9),Step=c("0","0","1","1","1","1","2","2","2"))
DT

I would like to subset this by Group,Output[i], Step and save as .csv files with a particular naming convention "column_Step_Group.csv". Examples shown below

Output1_0_A.csv

-0.5458808
0.5365853


Output2_0_A.csv
0.002311942
-1.316908124  

Output1_2_B.csv
0.4445853
-0.4664951
-0.8483700
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Can you please make a reproducible example of your data? It would help a lot to have a data frame that can be copied and pasted into R so that it is unambiguous what your data structure is. The thing you pasted there is pretty confusing. – SlowLoris May 24 '16 at 05:44
  • What do you mean by "subset this by ... Output[i]"? Is "Output[i] part of the grouping logic? For which "i"? Or do you just want to put all values if the i-th output column into the result file (as indicated in the example output)? – R Yoda May 24 '16 at 05:54
  • @Slow Ioris - i have added the data frame. Thanks for your help!! – user6354897 May 24 '16 at 06:03
  • @ R Yoda, yes to the latter - would like the values of the i-th output column into the result file for the given Group and Step. Thanks for your help!! – user6354897 May 24 '16 at 06:05
  • Possible duplicate of [writing many files in a for loop using R](http://stackoverflow.com/questions/7033699/writing-many-files-in-a-for-loop-using-r) – Mateusz1981 May 24 '16 at 06:11

2 Answers2

1

Or in loop

DT = data.frame(Group=c("A","A","A","A","B","B","B","B","B"),Output1 = rnorm(9),Output2=rnorm(9),Output3=rnorm(9),Step=c("0","0","1","1","1","1","2","2","2"))

for(i in levels(DT$Group)) {
  da <- subset(DT, Group == i)
  write.csv(da, paste("Group_", i, ".csv", sep = ""),  row.names = FALSE, quote = FALSE)
}
Mateusz1981
  • 1,817
  • 17
  • 33
  • Please check the expected output in the OP's post – akrun May 24 '16 at 07:05
  • The above loop does split out the files, but i wanted to add two more constraints - selection of Step and Output[i]. I tried writing nested loops by tinkering with the code above, but am not able to subset desired columns.for (k in 1:length(column)) { for(i in levels(DT$Group)) { for(j in levels(DT$Step)) { da <- subset(DT, Group == i & Step == j & colnames(DT) == column[k]) write.csv(da, paste("Group_", i, j, column[k], ".csv", sep = ""), row.names = FALSE, quote = FALSE) } } } is there anyway select few columns can be subsetted in the final output? thanks!! – user6354897 May 24 '16 at 08:19
  • have you tested @akrun solution? – Mateusz1981 May 24 '16 at 09:01
0

We can use split to create a list of data.frames. Loop through the names of the 'lst' (lapply(names(lst),...) and use fwrite from data.table (for faster writing).

lst <- split(DT[-c(1,5)], list(DT$Group, DT$Step), drop=TRUE)
library(data.table)
invisible(lapply(names(lst), function(x) {
      l1 <- split.default(lst[[x]], names(lst[[x]]))
      lapply(names(l1), function(y)  fwrite(l1[[y]],
             paste0(y, "_", x, ".csv")))}))

The output of "Output1_A.0.csv"

 Output1
-0.545880758
0.536585304
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Hello Akrun - thanks for providing this code. I tried running it as is and am getting the following error "Error in cat(list(...), file, sep, fill, labels, append) : argument 1 (type 'list') cannot be handled by 'cat' " Can you please help me understand what could be going wrong for me. Thanks in advance – user6354897 May 24 '16 at 08:14
  • @user6354897 I am not getting any error based on the example you showed – akrun May 24 '16 at 08:44
  • Was getting the error because fwrite doesnt seem to work. Here is a modification of the above code and it works great. Thanks for all your help!! – user6354897 May 24 '16 at 21:00
  • DT = data.frame(Group=c("A","A","A","A","B","B","B","B","B"),Output1 = rnorm(9),Output2=rnorm(9),Output3=rnorm(9),Step=c("0","0","1","1","1","1","2","2","2")) lst <- split(DT[-c(1,5)], list(DT$Group, DT$Step), drop=TRUE) library(data.table) invisible(lapply(names(lst), function(x) { l1 <- split.default(lst[[x]], names(lst[[x]])) lapply(names(l1), function(y) write.table(l1[[y]], paste0(y, "_", x, ".csv"), row.names=FALSE))})) – user6354897 May 24 '16 at 21:01
  • @user6354897 I guess I am using the devel version of data.table which has this function. It would be fast with fwrite. – akrun May 25 '16 at 03:04