Group By Multiple Columns in R

Question

I have a data set that I am looking at ... 3 columns that have name, store, and amount .... looks like this

name        store     amount
John Doe    Target    150

Now the trick is this John Doe may be in the file again but with this data

name        store     amount
John Doe    Walmart   50

I want to summarize and ADD new columns for all of John Doe's activity like the following:

name        store A   amount A   store B   amount B
John Doe    Target    150        Walmart   50

I have tried transpose and ddlyr but am not having any luck.
I have tried transpose and group by but it seems like you can't group by multiple columns

Thanks for your help...

Hi, and welcome to SO! Please take a look at [how to ask](https://stackoverflow.com/help/how-to-ask) and [how to make a good reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). In particular, the easiest way to get a quick answer is to provide sample data (that can be copy pasted), example of desired output, and the code you have already tried that doesn't give your desired output. It is definitely possible to group by multiple columns but without knowing exactly what you have tried we can't see the misunderstanding clearly. — Calum You, Jul 19 '19 at 20:12
Base R's `by` accepts as its second argument a list of columns, so something like `by(x, x[,c("name","store"), function(z) { ... })` will get you each person in each store, individually (you need to fill in the `...` here). If you're into `tidyverse` packages, then `group_by(x, name, store)` works well, too. And `data.table` provides `xDT[, ..., by=c("name", "store")]`. — r2evans, Jul 19 '19 at 20:12
Duplicate of [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) — M--, Jul 20 '19 at 01:32

score 1 · Answer 1 · answered Jul 19 '19 at 22:13

I have a solution that works for the particular example you've given, but I don't find it particularly elegant and it wouldn't work if you have more than 2 occurrences for each name (in that case you'd have to give more options to the mutate verb.

df <- tibble(name = c("John Doe"),
             store = c("Target", "Walmart"),
             amount = c(150, 50))

coalesce_by_column <- function(df) {
    return(dplyr::coalesce(!!! as.list(df)))
}

df2 <- df %>% 
    group_by(name) %>% 
    mutate(store_k = c("store A", "store B"),
           amount_k = c("amount A", "amount B")) %>% 
    spread(store_k, store) %>%
    spread(amount_k, amount) %>%
    summarise_all(coalesce_by_column)

The coalesce_by_column function combines the disjointed rows that contain NAs (check here for more details). If you want to rearrange the column order, you can easily do it with select.

Thanks! I think I got something with the Mutate function ....the mutate function is pretty awesome and I think I got a solution now ! — Anthony Will, Jul 22 '19 at 18:16
Great! Maybe you could update your question with the solution that worked for you in the end. — Arienrhod, Jul 23 '19 at 09:35

Group By Multiple Columns in R

1 Answers1