1

Suppose I have the result of a supermarket fruit stock survey stored in a data frame:

stock <- data.frame(
    store = c("Asda", "Booths", "Co-op"),
    apple = c(1, 0, 0),
    banana = c(1, 1, 0),
    coconut = c(0, 0, 0)
)

which looks like

   store apple banana coconut
1   Asda     1      1       0
2 Booths     0      1       0
3  Co-op     0      0       0

My goal:

I want to convert the above columns of binary survey results into a character vector of stock summary for each supermarket as below:

   store        fruits
1   Asda apple, banana
2 Booths        banana
3  Co-op              

My solution:

Step 1: I used a for loop to replace all 1s in the binary columns with the corresponding column names:

for(i in names(stock)[2:4]) {
    stock[which(stock[[i]] == 1), i] <- i
}

and got

   store apple banana coconut
1   Asda apple banana       0
2 Booths     0 banana       0
3  Co-op     0      0       0

Step 2: I use tidyr::unite() to concatenate the individual fruit columns into a character vector column:

library(tidyverse)
stock <- unite(stock, fruits, apple:coconut, sep = ", ")

giving me

   store           fruits
1   Asda apple, banana, 0
2 Booths     0, banana, 0
3  Co-op          0, 0, 0

Step 3: I had to use stringr::str_replace_all() to remove all unwanted 0s and comma separators:

library(stringr)
stock$fruits <- str_replace_all(stock$fruits, "0, |, 0|0", "")

Although this could get me the result I want, I find my solution rather clumsy, especially the looping part. Could anyone kindly share with me a more efficient and straightforward solution, please? Many thanks in advance!

elarry
  • 521
  • 2
  • 7
  • 20

2 Answers2

3

The task requires to reshape the input data from wide to long format.

Although the question is explicitely tagged with tidyverse, I would like to start with a concise data.table solution using melt() which I'm more acquainted with:

library(data.table)
melt(setDT(stock), id.vars = "store")[
  value > 0, .(fruits = toString(variable)), keyby = store][.(stock$store)]
    store        fruits
1:   Asda apple, banana
2: Booths        banana
3:  Co-op            NA

It coerces stock to class data.table and reshapes it from wide to long format. Then, only rows which have at least one fruit are considered in the subsequent aggregation where the result is grouped by store. toString() is used for aggregation which is a concise alternative to paste(). In order to include all stores, even those without any fruit, a final right join is required.


The same goal can be achieved by using functions from the tidyr and dplyr packages as requested by the OP:

library(magrittr)
stock %>% 
  tidyr::gather(fruits, , -store) %>% 
  dplyr::filter(value > 0) %>% 
  dplyr::group_by(store) %>% 
  dplyr::summarise(toString(fruits)) %>% 
  dplyr::right_join(stock %>% dplyr::select(store))
# A tibble: 3 x 2
   store `toString(fruits)`
  <fctr>              <chr>
1   Asda      apple, banana
2 Booths             banana
3  Co-op               <NA>

Both results are equivalent.

Note that references to tidyverse functions are made explicit in order to avoid name conflicts due to a cluttered name space.

Uwe
  • 41,420
  • 11
  • 90
  • 134
  • Thanks SO much for kindly offering two solutions to my question while paying considerate attention to my question tags, and for your extra coding comments explaining things clearly! Although my preferred `tidyverse` solution is pretty neat, I am also amazed by the conciseness of the `data.table` alternative. – elarry Jun 26 '17 at 15:29
  • @elarry To be honest, I prefer `data.table` not only for its concise code but also for performance reasons for larger problems. E.g., see [my benchmark of a different question](https://stackoverflow.com/a/44755588/3817004). – Uwe Jun 26 '17 at 15:34
2

Assuming unique store names, only 1s and 0s, and no missing values:

library(dplyr)
library(tidyr)
result <- stock %>%
  gather(fruit, binary, -store) %>%
  mutate(fruit = if_else(binary == 1, fruit, NA_character_)) %>%
  select(-binary) %>%
  filter(!is.na(fruit)) %>%
  group_by(store) %>%
  summarize(fruits = paste(fruit, collapse = ", ")) %>%
  ungroup() %>%
  right_join(stock %>% select(store)) %>%
  mutate(fruits = if_else(is.na(fruits), "", fruits))
arcvetkovic
  • 106
  • 2
  • Thanks very much for sharing a `paste()` alternative for converting the binary data to character vectors - it's particularly useful for people like me who don't know the `toString()` function. :) – elarry Jun 26 '17 at 15:45