How to convert a string to a variable and to loop through group_by?

Question

Assume I have a dataset with two columns, Location and Product, that shows how many of each product is sold at each location. I create a contingency table for the number of each product sold at each location:

data%>% 
  group_by(Location,Product)%>%
  summarize(n=n()) %>%
  pivot_wider(names_from = product, values_from = n)

Now, imagine that instead of a single Product column, I have US_Product, Japan_Product,..., Germany_Product. How can I create my contingency tables in a for loop? NOTE: when I create a vector of products like p<-c("Product1", "Product2",..., "Product3") and loop through these products, I get an error message because these are strings and not variable names.

Here is a minimal example:

Location <- c("AB","ON","MN","AB","ON")
Product1<-c("Type1","Type2","Type1","Type3","Type1")
Product2<-c("Type3","Type2","Type3","Type3","Type2")
Product3<-c("Type1","Type2","Type1","Type1","Type1")
data <- tibble(Location,Product1,Product2,Product3)
data%>% 
  group_by(Location,Product1)%>%
  summarize(n=n()) %>%
  pivot_wider(names_from = Product1, values_from = n) #this works as expected

#now I want to do the same thing in a loop
prodV <- c("Product1","Product2","Product3")
for (i in c(1:3)){
  var <- prodV[i]
  data%>% 
    group_by(Location,var)%>%
    summarize(n=n()) %>%
    pivot_wider(names_from = var, values_from = n)   
}

Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) — Tung, Jan 17 '20 at 19:57

akrun · Answer 1 · 2020-01-17T20:07:00.430

If we need to use it in a loop, then one option is map

library(dplyr)
library(purrr)
library(tidyr)
map(p, ~ 
         data%>% 
           group_by_at(vars("Location", .x)) %>%
           summarize(n=n()) %>%
           pivot_wider(names_from = .x, values_from = n))

Using a reproducible example

data(mtcars)
p <- c("cyl", "vs", "am")
map(p, ~ 
         mtcars %>% 
             group_by_at(vars('gear', .x)) %>% 
             summarise(n = n()) %>%
             pivot_wider(names_from = .x, values_from = n) )

Or if we use a for loop, then create an empty list to store the output from each iteration ('out'), loop over the 'p' values, and change only the .x part from map while assigning the output to each element of 'out' list

out <- vector('list', length(p))
names(out) <- p
for(p1 in p) {
       out[[p1]] <- data %>%
                      group_by_at(vars("Location", p1)) %>%
                      summarize(n = n()) %>%
                      pivot_wider(names_from = p1, values_from = n)
   }

score 0 · Answer 2 · answered Jan 17 '20 at 20:23

Not sure if the following is the thing you are after. Below is a base R solution to make contingency tables:

p <- c("US_Product","Japan_product","Germany_Product")
res <- Map(function(x) table(df[c("Location",x)]),p)

such that

> res
$US_Product
        US_Product
Location a b c
      XX 2 0 1
      YY 1 1 2

$Japan_product
        Japan_product
Location d e f
      XX 0 2 1
      YY 3 0 1

$Germany_Product
        Germany_Product
Location g i j
      XX 0 3 0
      YY 1 1 2

Dummy DATA

df <- > dput(df)
structure(list(Location = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L), .Label = c("XX", "YY"), class = "factor"), US_Product = structure(c(1L, 
3L, 1L, 2L, 1L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
    Japan_product = structure(c(2L, 2L, 3L, 3L, 1L, 1L, 1L), .Label = c("d", 
    "e", "f"), class = "factor"), Germany_Product = structure(c(2L, 
    2L, 2L, 2L, 3L, 1L, 3L), .Label = c("g", "i", "j"), class = "factor")), class = "data.frame", row.names = c(NA, 
-7L))

score 0 · Answer 3 · answered Jan 17 '20 at 22:15

I was able to handle the problem using group_by_at as opposed to group_by. According to dplyr: whats the difference between group_by and group_by_ functions? if one needs to have inputs with quotation marks, SE versions of functions should be used, instead of NSE versions---please see the link for a detailed explanation.

prodV <- c("Product1","Product2","Product3")
for (i in c(1:3)){
  var <- prodV[i]
  a<-data%>% 
    group_by_at(vars("Location",var))%>%
    summarize(n=n()) %>%
    pivot_wider(names_from = var, values_from = n)   
  print(a)
}

How to convert a string to a variable and to loop through group_by?

3 Answers3