2

I have a data frame with 40 variables G1_a, G1_b, ... till G20_a, G20_b (stemming from a survey). I want to create 20 new variables G1 ... G20 that summarize the existing variables.

data <- data.frame(G1_a = c(0, 0, 0, 1, NA), 
               G1_b = c(0, 0, 1, 1, NA), 
               G2_a = c(0, 0, 0, 1, NA), 
               G2_b = c(0, 0, 1, 1, NA))

# Reshaping without for-loop:
data <- data %>% 
  mutate(G1 = case_when(
    G1_a == 1 ~ "own_offer", 
    G1_b == 1 ~ "no_offer", 
    T ~ NA_character_
  ))

data <- data %>% 
  mutate(G2 = case_when(
    G2_a == 1 ~ "own_offer", 
    G2_b == 1 ~ "no_offer", 
    T ~ NA_character_
  ))

I want to automate the creation of the new variables in a for-loop, something like:

# Reshaping with for-loop:
for(i in 1:2) {
 data <- data %>% 
   mutate(assign(paste0("G", i), case_when(
     get(paste0("G", i, "_a")) == 1 ~ "own_offer", 
     get(paste0("G", i, "_b")) == 1 ~ "no_offer", 
     T ~ NA_character_
    )))
  }

My question includes two parts:

1) Is it possible to combine assign with mutate? I'm aware of approaches like mutate(df, !!varname := Petal.width * n) (see here) to dynamically assign parameter names. However, I was unable to combine it with the data reshaping I want to run.

2) Does dplyr allow the use of paste0 together with case_when and mutate?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Annerose N
  • 477
  • 6
  • 14
  • 2
    is there a reason you can't return the results in a list rather than creating a bunch of new variables? It would be much easier, and more idiomatic ... – Ben Bolker Oct 05 '18 at 12:32
  • @BenBolker : the data frame contains many more variables. If I create a list, I have to eventually add it to the original data frame. I'd be happy for suggestions using such an approach. – Annerose N Oct 05 '18 at 13:38
  • what is your final target? – Ben Bolker Oct 05 '18 at 14:40
  • The final target is to get a clean dataset containing all survey responses in a format that allows easy and smooth analysis of the data. `G1_a`, `G1_b` etc. were provided as dummies, but should be encoded as categorical variables. Every `G1_a` and `G1_b` should become `G1`, `G2_a` and `G2_b` becomes `G2` etc. – Annerose N Oct 05 '18 at 14:46

1 Answers1

2

This is a little tricky, but I think it's the principled way to do it. The final result is a data frame with the desired columns, thus avoiding all of the get()/assign() headaches (and not cluttering up the workspace with lots of derived variables.) There are several steps where we change the shape of the data frame (wide -> long -> partially wide -> wide) using tidyr::gather() and tidyr::spread(). If it seems overwhelming, experiment with stopping the pipe sequence at various intermediate points to see what has been achieved so far.

library(tidyr)
library(dplyr)
dds <- (dd
  %>% mutate(case=seq(n()))    ## need a variable to distinguish rows in original data set
  %>% gather(var,val,-case)    ## -> long format: {case, var={G1_a,G1_b,...}, val={0,1,NA}}
  %>% separate(var,c("var","response"))  ## split to "G1","G2" + "a", "b"
  %>% spread(response,val)               ## convert back to semi-wide: {case, var, a, b}
  ## now collapse rows to categorical value, as above
  %>% mutate(offer=case_when(a==1 ~ "own_offer",
                             b==1 ~ "no_offer",
                             TRUE ~ NA_character_))
  %>% select(-c(a,b))          ## clean up now-redundant variables
  %>% spread(var,offer)        ## convert back to wide format: {case, G1, G2, ...}
  %>% select(-case)            ## now redundant
)

Result

         G1        G2
1      <NA>      <NA>
2      <NA>      <NA>
3  no_offer  no_offer
4 own_offer own_offer
5      <NA>      <NA>
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453