Group by multiple columns and write each group into csv

Question

BRAND ID    FIRST_NAME  CITY
6     1167  Sara        Sarasota
6     1167  Jeffrey     Sarasota
6     1167  Shane       Sarasota
8     1167  Paul        Lakewood Ranch
6     10931 Carol       Grosse Pointe
6     10931 Nick        Detroit
8     10931 William     Grosse Pointe Farms
6     4935  Debra       Lacoochee
6     4935  Mark        Zephyr hills
6     4935  Michael     Zephyr hills

I have a .csv file look like the above table with customer information that I need to split up in multiple .csv files.

Each file needs to be split up by Brand and ID. In the example above, there would be 2 .csv files where (Brand = 6, ID = 1167) with 3 customer records and (Brand = 8, ID = 1167) with 1 customer record.

I need help with some code that can detect each unique ID because I have like 30 different IDs in my data file.

@zx8754 Just a point, OP asks about grouping and writing to csv (applying a function to each group) while the linked posts are giving answers to how to make a list of dfs and apply a function on each element of that list. Kind of inefficient compares to what I have, don't you think so? — M--, Sep 25 '17 at 20:53
@Masoud nothing against your answer. This is pretty simple task if one knows basics of R, no need for extra packages. Also, post attracted bad answer, hence I closed as duplicate. Linked posts should help to get started with the post. If disagree feel free vote to re-open. — zx8754, Sep 25 '17 at 21:10

M-- · Accepted Answer · 2017-09-25T22:08:57.690

3

Using data.table package we can use the followings:

grouping by one column:

library(data.table)
setDT(mydat)[, write.csv(c(.BY,.SD), paste0("BRAND_ID_", .BY, ".csv")), by=BRAND]

grouping by two (or more) columns:

For multiple columns, I can think of adding a helping column which is combination of desired columns for grouping:

library(data.table)
setDT(mydat)[, tempcol:=paste(ID,BRAND,sep="_")]
setDT(mydat)[, write.csv(.SD, paste0("ID_BRAND_", .BY, ".csv")),by=tempcol]

This gives us 5 files for each unique combination of IDs and Brands. They will be saved to current working directory. Read about ?write.csv if you want to have the outputs in a desired directory.

Data:

read.table(text='BRAND ID    FIRST_NAME  CITY
                6     1167  Sara        Sarasota
                6     1167  Jeffrey     Sarasota
                6     1167  Shane       Sarasota
                8     1167  Paul        Lakewood_Ranch    
                6     10931 Carol       Grosse_Pointe
                6     10931 Nick        Detroit
                8     10931 William     Grosse_Pointe_Farms
                6     4935  Debra       Lacoochee
                6     4935  Mark        Zephyr_hills
                6     4935  Michael     Zephyr_hills', header=T, quote="") -> mydat

edited Sep 25 '17 at 22:08

answered Sep 25 '17 at 19:38

M--

25,431
8
61
93

Thanks for your help. I tried your code but got this error Error in file(file, ifelse(append, "a", "w")) : invalid 'description' argument In addition: Warning message: In if (file == "") file <- stdout() else if (is.character(file)) { : the condition has length > 1 and only the first element will be used – lomingchun Sep 25 '17 at 19:49
@lomingchun The error explains that you're misusing the description= argument of `?file`. Also, I guess you won't need `stdout()` unless you're doing something very exotic. – Frank Sep 25 '17 at 19:52
What needs to change on the code that you provided to fix the error? If you need more info about what I'm doing, i'm happy to provide it. Thanks for your help so far. – lomingchun Sep 25 '17 at 19:54
@lomingchun That's the problem with how I name the output file. Using `.BY` as the name works when we group by only one column but does not work for multiple columns grouping. Do you need an explanatory name for each file or it's fine if they come as `file_1.csv`, `file_2.csv`, etc.? – M-- Sep 25 '17 at 20:00
Filename should have the ID and brand. "1167_6.csv" – lomingchun Sep 25 '17 at 20:26
@Masoud > setDT(mydat)[, write.csv(c(.BY,.SD), paste0("BRAND_ID_", .BY, ".csv")), by=BRAND] Empty data.table (0 rows) of 1 col: BRAND > setDT(mydat)[, tempcol:=paste(ID,BRAND,sep="_")] > setDT(mydat)[, write.csv(.SD, paste0("ID_BRAND_", .BY, ".csv")),by=tempcol] Error in eval(bysub, parent.frame(), parent.frame()) : object 'tempcol' not found – lomingchun Sep 25 '17 at 22:02
@lomingchun `setDT(mydat)[, tempcol:=paste(ID,BRAND,sep="_")];setDT(mydat)[, write.csv(.SD, paste0("ID_BRAND_", .BY, ".csv")),by=tempcol]` run this (both lines in the second block of code in my answer). First one is not for your question it's a simplified answer for a situation that only brand (one column is needed for grouping). **You need to add the `tempcol` which first line in the second block takes care of it and then you can group by that.** Read the answer carefully, including the text in between the lines of code to avoid from these confusions. – M-- Sep 25 '17 at 22:05

Group by multiple columns and write each group into csv

1 Answers1

grouping by one column:

grouping by two (or more) columns: