How can I write dplyr groups to separate files?

Question

I'm trying to create separate .csv files for each group in a data frame grouped with dplyr's group_by function. So far I have something like

by_cyl <- group_by(mtcars, cyl)
do(by_cyl, write_csv(., "test.csv"))

As expected, this writes a single .csv file with only the data from the last group. How can I modify this to write multiple .csv files, each with filenames that include cyl?

score 33 · Answer 1 · edited Mar 01 '23 at 16:23

33

With dplyr_0.8.0 this can be done with group_by and group_walk

library(dplyr)
library(readr)
mtcars %>%
   group_by(cyl) %>%
   group_walk(~ write_csv(.x, paste0(.y$cyl, "test.csv")))

edited Mar 01 '23 at 16:23

psychonomics

714
4
12
26

answered Feb 21 '19 at 12:26

akrun

874,273
37
540
662

2

I had no idea this function existed, thanks for your answer! – Andrew Brēza May 15 '19 at 12:19
1

This is so much easier now! Thanks. – damo Oct 11 '19 at 09:05

score 21 · Accepted Answer · answered Dec 20 '16 at 01:03

21

You can wrap the csv write process in a custom function as follows. Note that the function has to return a data.frame else it returns an error Error: Results are not data frames at positions

This will return 3 csv files named "mtcars_cyl_4.csv","mtcars_cyl_6.csv" and "mtcars_cyl_8.csv"

customFun  = function(DF) {
write.csv(DF,paste0("mtcars_cyl_",unique(DF$cyl),".csv"))
return(DF)
}

mtcars %>% 
group_by(cyl) %>% 
do(customFun(.))

answered Dec 20 '16 at 01:03

Silence Dogood

3,587
1
13
17

Exactly what I needed! As an aside - in my actual case I was grouping by two variables; turns out the order by which you group them is really important. For the example, "cyl" would have to be the first grouping for this to work. – Nat Dec 20 '16 at 16:46
Beautiful function wrapper! Thank you! – philiporlando Jan 31 '18 at 03:31

score 7 · Answer 3 · answered Nov 21 '17 at 20:41

7

The following works (you can skip the custom function)

library(dplyr)
library(readr)
group_by(mtcars, cyl) %>%
  do(write_csv(., paste0(unique(.$cyl), "test.csv")))

answered Nov 21 '17 at 20:41

CPak

13,260
3
30
48

I get this "Error: Results are not data frames at positions: 1, 2, 3, 4, 5, 6, 7" while the answer by @OdeToMyFiddle works. – val Jan 27 '18 at 08:28

Rob Donnelly · Answer 4 · 2017-03-11T01:05:02.480

4

If you were willing to use data.table there is a slightly less clunky way of doing it.

require(data.table)
# Because this is a built in table we have to make a copy first
mtcars <- mtcars 
setDT(mtcars) # convert the data into a data.table

mtcars[, write.csv(.SD, paste0("mtcars_cyl_", .BY, ".csv")), by = cyl]

Note that the resulting table will not have a column for cyl (which would be redundant since it is stored in the file name, but maybe you want to leave it in for other reasons).

If you want cyl to be included in the output as a column you can use

mtcars[, write.csv(c(.BY,.SD), paste0("mtcars_cyl_", .BY, ".csv")), by=cyl]

edited Mar 11 '17 at 01:05

answered Mar 11 '17 at 00:45

Rob Donnelly

2,256
2
20
29

You get an error if you convert one of the built in tables to a data.table without copying it first. Here's the error you get "Error in setDT(mtcars) : Can not convert 'mtcars' to data.table by reference because binding is locked. It is very likely that 'mtcars' resides within a package (or an environment) that is locked to prevent modifying its variable bindings. Try copying the object to your current environment, ex: var <- copy(var) and then using setDT again." – Rob Donnelly Mar 11 '17 at 00:55
Thanks for the suggestions Rich. – Rob Donnelly Mar 11 '17 at 00:58
1

@RobDonnelly , If you really want to improve the speed of your code, replace `write.csv` with `fwrite`, which is the native `data.table` way of writing a `.csv`. It is super fast as it works in parallel. – rafa.pereira May 06 '17 at 09:00
@RobDonnelly Is there a way to use `fwrite()` instead of using `write.csv()` in your code? the former one is way faster when applied to large datasets. – Miao Cai Jul 24 '19 at 03:39

How can I write dplyr groups to separate files?

4 Answers4

Linked