how to write a for loop in R that recodes multiple variables?

Question

I am a self-taught programmer with a few years' experience in MATLAB. I'm brand new to R and this is my first question on Stack Overflow.

I am trying to recode multiple variables in a dataframe using recode from dplyr. In the code below, I provide a snippet of data and the list of options, opt_dass, I want to use with recode. I would like to convert the string values in each variable starting with "dass" to a number - Never = 0, Sometimes = 1, and so on.

I am aware that there are multiple approaches to this problem, including ifelse, lapply, and case_when. In the below example, I am wondering why I am getting the error about a non-language object. I am using paste0 to create variable names to reference in data. I've done a lot of reading about how to reference column names in a for loop in R and I still haven't found the answer.

library(tidyverse)

data <- structure(list(id = c("1", "2", "3", "4", "5", "6", "7", "8", 
                              "9", "11"), dass1_t1 = c("Sometimes", "Often", "Often", "Almost Always", 
                                                       "Sometimes", "Sometimes", "Sometimes", "Sometimes", "Sometimes", 
                                                       "Sometimes"), dass2_t1 = c("Sometimes", "Never", "Often", "Sometimes", 
                                                                                  "Sometimes", "Never", "Sometimes", "Sometimes", "Often", "Sometimes"
                                                       ), dass3_t1 = c("Often", "Sometimes", "Never", "Never", "Never", 
                                                                       "Sometimes", "Never", "Never", "Sometimes", "Sometimes"), dass4_t1 = c("Never", 
                                                                                                                                              "Never", "Never", "Never", "Never", "Sometimes", "Never", "Never", 
                                                                                                                                              "Never", "Sometimes"), dass5_t1 = c("Almost Always", "Sometimes", 
                                                                                                                                                                                  "Never", "Sometimes", "Never", "Sometimes", "Sometimes", "Never", 
                                                                                                                                                                                  "Almost Always", "Often")), row.names = c(NA, -10L), class = "data.frame")

opt_dass <- list("Never"=0,"Sometimes"=1,"Often"=2,"Almost Always"=3) # list - chr to num

# my attempt at a for loop to recode
for (i in 1:5) {
  attach(data)
  paste0("dass_", i, "_t1") <- recode(paste0("dass_", i, "_t1"), !!!opt_dass, .default=NA_real_)
}

#> Error in paste0("dass_", i, "_t1") <- recode(paste0("dass_", i, "_t1"), : target of assignment expands to non-language object

^{Created on 2020-11-04 by the reprex package (v0.3.0)}

Bonus question: Is there a way to write one for loop that could accomplish recoding for multiple sets of variables with different sets of options? I have a dataset with multiple self-report measures where different string responses have different numeric values. I think this would involve some metaprogramming and would love to hear your ideas!

score 1 · Answer 1 · answered Nov 04 '20 at 17:01

First, use a named vector instead of a named list

opt_dass <- c("Never"=0,"Sometimes"=1,"Often"=2,"Almost Always"=3)

Then just

mutate(data, across(starts_with("dass"), ~unname(opt_dass[.])))

Output

   id dass1_t1 dass2_t1 dass3_t1 dass4_t1 dass5_t1
1   1        1        1        2        0        3
2   2        2        0        1        0        1
3   3        2        2        0        0        0
4   4        3        1        0        0        1
5   5        1        1        0        0        0
6   6        1        0        1        1        1
7   7        1        1        0        0        1
8   8        1        1        0        0        0
9   9        1        2        1        0        3
10 11        1        1        1        1        2

score 0 · Answer 2 · answered Nov 04 '20 at 16:59

I would probably just make them factors and then use the underlying integer codes (minus 1):

data %>%
    mutate_at(.vars = vars(starts_with("dass")),
                        .funs = ~factor(x = .,levels = c("Never","Sometimes","Often","Almost Always"))) %>%
    mutate_at(.vars = vars(starts_with("dass")),
                        .funs = ~as.integer(.) - 1)

score 0 · Answer 3 · answered Nov 04 '20 at 17:08

Reshape wide-to-long, recode, then reshape back to long-to-wide:

pivot_longer(data, cols = -1) %>% 
  mutate(value = recode(value, "Never"=0,"Sometimes"=1,
                        "Often"=2,"Almost Always"=3)) %>% 
  pivot_wider(., id_cols = "id")

# # A tibble: 10 x 6
#    id    dass1_t1 dass2_t1 dass3_t1 dass4_t1 dass5_t1
#    <chr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#  1 1            1        1        2        0        3
#  2 2            2        0        1        0        1
#  3 3            2        2        0        0        0
#  4 4            3        1        0        0        1
#  5 5            1        1        0        0        0
#  6 6            1        0        1        1        1
#  7 7            1        1        0        0        1
#  8 8            1        1        0        0        0
#  9 9            1        2        1        0        3
# 10 11           1        1        1        1        2

score 0 · Answer 4 · answered Nov 04 '20 at 21:20

I really like the lookup() function from qdapTools package for recoding. Here's how to use it for your use case:

library(tidyverse)
library(qdapTools)

data <- structure(
  list(
    id = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "11"),
    dass1_t1 = c(
      "Sometimes", "Often", "Often", "Almost Always", "Sometimes", "Sometimes",
      "Sometimes", "Sometimes", "Sometimes", "Sometimes"
    ),
    dass2_t1 = c(
      "Sometimes", "Never", "Often", "Sometimes", "Sometimes", "Never", 
      "Sometimes", "Sometimes", "Often", "Sometimes"
    ),
    dass3_t1 = c(
      "Often", "Sometimes", "Never", "Never", "Never", "Sometimes", "Never",
      "Never", "Sometimes", "Sometimes"
    ),
    dass4_t1 = c(
      "Never", "Never", "Never", "Never", "Never", "Sometimes", "Never", 
      "Never", "Never", "Sometimes"
    ),
    dass5_t1 = c(
      "Almost Always", "Sometimes", "Never", "Sometimes", "Never", "Sometimes",
      "Sometimes", "Never", "Almost Always", "Often"
    )
  ),
  row.names = c(NA, -10L),
  class = "data.frame"
)

opt_dass <- list(
  `0` = "Never", `1` = "Sometimes", `2` = "Often", `3` = "Almost Always"
)

data %>% 
  mutate(across(dass1_t1:dass5_t1, ~ as.numeric(lookup(.x, opt_dass))))
#>    id dass1_t1 dass2_t1 dass3_t1 dass4_t1 dass5_t1
#> 1   1        1        1        2        0        3
#> 2   2        2        0        1        0        1
#> 3   3        2        2        0        0        0
#> 4   4        3        1        0        0        1
#> 5   5        1        1        0        0        0
#> 6   6        1        0        1        1        1
#> 7   7        1        1        0        0        1
#> 8   8        1        1        0        0        0
#> 9   9        1        2        1        0        3
#> 10 11        1        1        1        1        2

^{Created on 2020-11-04 by the reprex package (v0.3.0)}

To learn more about the lookup() function, take a look at qdapTool's reference manual.

score 0 · Answer 5 · answered Nov 04 '20 at 22:04

The error you are seeing is because you can’t use an expression to build the name of a variable to assign to on the left-hand side of <-. For more details, see this answer.

For recoding multiple variables, you could store your recode mappings in a named list with an element for each variable you want to recode:

recodes <- list(
  q1 = c(a = 1, b = 2, c = 3),
  q2 = c(x = 9, y = 8, z = 7)
)

tbl <- data.frame(
  id = c(1, 2, 3),
  q1 = c("b", "a", "c"),
  q2 = c("z", "z", "x")
)

tbl
#>   id q1 q2
#> 1  1  b  z
#> 2  2  a  z
#> 3  3  c  x

Then loop over the variables in your data, and pick the recode mapping to use from the list based on the variable name:

for (variable in names(tbl)) {
  if (!variable %in% names(recodes)) {
    next # skip if the variable isn't recoded
  }
  
  new_variable <- paste0(variable, "n")
  
  old_value <- tbl[[variable]]
  new_value <- recodes[[variable]][old_value]
  
  tbl[[new_variable]] <- new_value
}

tbl
#>   id q1 q2 q1n q2n
#> 1  1  b  z   2   7
#> 2  2  a  z   1   7
#> 3  3  c  x   3   9

how to write a for loop in R that recodes multiple variables?

5 Answers5