0

I have list data for which I used split:

x <- split(A, f = A$Col_1)

It works beautifully. But now I need to write each chunk of the split to an individual .csv. There are 2100 chunks of 140 rows each. Let's call them "1:2100". I would like to create something that wrote "1" to "~/full_path_name/A1.csv" then go to "2" and write to "~/full_path_name/A2.csv", then "3" to "~/full_path_name/A3.csv", etc.

I included "~/full_path_name/" because down the road this path name will change for other data using the same code, and for my own understanding I need to see it in the code. I don't know how to write a small sample of what I am asking for for someone to correct because I don't know how to write it at all.

Can someone make a suggestion on how to do this? Thank you.

I have only been coding for month and am entirely self-taught. I do not have a background in other coding programs. I have no one to ask for help but here. I struggle with the terminology, so please understand if I am not asking in the proper way and I will try to correct it if need be.

EDIT, AFTER DOING SOME FURTHER RESEARCH --

This is what I have found elsewhere on SO from @RichPaloo, and my adaptations below that:

#example data.frame
df <- data.frame(x = 1:4, y = c("a", "a", "b", "b")) 

#split into a list by the y column 
l <- split(df, df$y) 

#the names of the list are the unique values of the y column 
nam <- names(l) 

#iterate over the list and the vector of list names and write csvs
for(i in 1:length(l)) {
  write_csv(l[[i]], paste0(nam[i], ".csv"))
}

This is my version:

bcc4.5_WINTER <- split(bcc4.5_FinalWinterRO, f = bcc4.5_FinalWinterRO$HUC8) 

nam <- names(bcc4.5_WINTER) 

for(i in 1:length(bcc4.5_WINTER)) {
write_csv(bcc4.5_WINTER[[i]], paste0(“~/Rprojects/BCC_CSM1_1_RCP_45/Winter/”, nam[i], “.csv”))
}

I appear to have a problem with the folder within my home folder "/BCC_CSM1_1_RCP_45/Winter/” It says "unexpected token" at both ends, but not at the "~Rprojects". Can I not send something to a folder within my home folder?

It also shows redlines under the quotes around ".csv" near the end. I don't know what to make of this because it's exactly what the person used successfully, apparently, in another post. Thank you.

Paul
  • 2,850
  • 1
  • 12
  • 37
  • Hi, have you tried something like `lapply(bcc4.WINTER, FUN = function(x) print(x))`? Note that you can do whatever you want with the custom function. Just keep in mind that the `x` is a chunk of your list. This piece of code displays the same thing as `print(bcc4.WINTER)`, its purpose is to show how you can use `lapply()` here. – Paul May 05 '21 at 06:39
  • Also, please, post how you solved your problem as an answer of your question, it might help others :) – Paul May 05 '21 at 06:41
  • Thank you, @Paul. This is somewhat helpful, however it requires (from what I can tell) that I manually input the name of each chunk. There are 2099 chunks. I do have the names in a list in a .csv, is there a way I could code it to iteratively choose the next chunk, writing it as "chunk_name.csv", until complete? BTW, I did share how I solved the first part of the problem. It is the "split" code, above. I appreciate your help. – David Montana May 05 '21 at 21:17
  • Just to end with the link between your questions, [the 1st one](https://stackoverflow.com/questions/67326138/select-next-value-in-column-on-which-to-perform-a-function-iteratively-in-r) is not linked to this question and does not have the right tags. For the sake of clarity you could merge both your question on this one for example. – Paul May 06 '21 at 05:52
  • Moving on to naming topic. If a "chunk" is a piece of the list (like `ex_list[["setosa"]]`) then you do not need to manually input the name. At this point I feel it is usefull you share a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example. To produce a minimal data set, you can use `head()`, `subset()`. Then use `dput()` to give us something that can be put in R immediately. Alternatively, you can use base R datasets (to see complete list `library(help = "datasets")`). – Paul May 06 '21 at 05:55
  • @Paul - I deleted the first question since no one had answered it and it was irrelevant at this point anyway. I have tried to simplify what I am asking for. I hope you understand that I am not able to produce a MRE (yes, I read the link) because I am not having an "error", I don't know how to do it *at all* or even the words to use besides what I've asked here. Your first example with base code is just what I need except that I need it to move to the next name in the list and do the same thing again, through the whole list. Your second example produces the same as my code above. Thanks again. – David Montana May 06 '21 at 23:47
  • @Paul, I am going to see if I can figure out the mapply() you suggested. – David Montana May 06 '21 at 23:58
  • @Paul - I did not understand how to write what I need in mapply. – David Montana May 07 '21 at 03:04
  • Your `for` loop seems correct. However, when I copy/paste `paste0(“~/Rprojects/BCC_CSM1_1_RCP_45/Winter/”, nam[i], “.csv”)` in RStudio, it tells me that the quotes are wrong, **“...”** is NOT **"..."**. Seems like you have a typo issue. Try: `paste0("~/Rprojects/BCC_CSM1_1_RCP_45/Winter/", nam[i], ".csv")` – Paul May 07 '21 at 09:11
  • OH. I had copied it from SO into MSWord to save it in my notes. That must have turned the quotes into smart quotes. I did not even see that, and even in your example it took me quite a while to understand what you meant. Thanks. – David Montana May 07 '21 at 22:02
  • That said, it is not working. I input my names in place of the "l". ``bcc1_45Win <- function(bcc4.5_WINTER_i, name_i) { print(bcc4.5_WINTER_i) # print the output in console write.csv(bcc4.5_WINTER_i, file = paste0("~/Rprojects/BCC_CSM_1_1_RCP_45/Winter/", name_i, ".csv")) # save the file on your computer } mapply(FUN = save_fun, bcc4.5_WINTER_i = bcc4.5_WINTER, name_i = names(bcc4.5_WINTER), SIMPLIFY = FALSE)`` Nothing is being written to my folder. – David Montana May 07 '21 at 22:15
  • Two things: 1) the first problem was with `l_i`. I am not iterating thru a series of files. So I removed that from your code in ea instance. This worked perfectly for the example file. 2) when I try my own file names (with the `_i` removed) I am getting this msg: _Error in file(file, ifelse(append, "a", "w")) : cannot open the connection In addition: Warning message: In file(file, ifelse(append, "a", "w")) :_ Research says it's a permissions prob but I set admin prmns in Rstudio and restarted. Still happens. (Also rmvd `print()' though it didn't make a difference.) Thanks for your help. – David Montana May 08 '21 at 00:04

2 Answers2

1

So, the code example above (@Paul) worked except the df[l] was not being iterated, so I removed the _i from each l instance. The final problem I had (in comments above) was because the path name was not complete.

I used fwrite() rather than write.csv because it gave me better feedback as I struggled with mistakes. This gave me what I needed:

#split file into chunks by names within a row, in this case row "BBB"

df <- split(old_df, f = old_df$BBB)

#write those chunks to individual .csv files with the name being the name of each chunk

save_fun <- function(df, name_i) {
  fwrite(df, file = paste0("~/Desktop/projects_folder/", name_i, ".csv"))
}

#save the file on your computer

mapply(FUN = save_fun, df, name_i = names(df), SIMPLIFY = FALSE)

Much thanks to Paul.

0

Investigating the potential typo problem

Please see the two lines below:

  write.csv(l[[1]], file = paste0("./a_folder/", names(l)[1], ".csv"))
  write.csv(l[[1]], file = paste0(“./a_folder/”, names(l)[1], “csv”))

Line 1 will save the file. Note that "./a_folder/" and ".csv" are seen as text.

Line 2 “./a_folder/” and “.csv” are not recognized as text. Line 2 produces an error: unexpected input in " write.csv(l[[1]], file = paste0(“"

RStudio colors your code to help you with this problem.


Thoughts about not using a for loop. I think one better way to go (especialy when you have large dataset) is by using lapply or mapply. What these functions do is take each "chunk" of a list and apply a function to it.

As lapply loses the name of each chunk while processing it. It can be annoying when you want to use the name of the chunk to name the file on your computer. mapply() comes handy to deal with this situation.

Here is an example using the provided example.

# example data.frame
df <- data.frame(x = 1:4, y = c("a", "a", "b", "b"))
# split df
l <- split(df, df$y)

# save each "chunk" of l as a .csv file on a hard drive

# 1st, create a function that takes a "chunk" of your list and its name as inputs

save_fun <- function(l_i, name_i) {
  print(l_i) # print the output in console
  write.csv(l_i, file = paste0("./a_folder/", name_i, ".csv")) # save the file on your computer
}
# 2nd, use mapply (and not a list) to use the previous function on each pair chunk/name 

mapply(FUN = save_fun, l_i = l, name_i = names(l), SIMPLIFY = FALSE) # see ?mapply for how to use mapply()
Paul
  • 2,850
  • 1
  • 12
  • 37