Using R dplyr::mutate() with a for loop and dynamic variables

Question

Disclaimer: I think there is a much more efficient solution (perhaps an anonymous function with a list or *apply functions?) hence why I have come to you much more experienced people for help!

The data

Let's say I have a df with participant responses to 3 question As and 3 question Bs e.g.

qa1, qa2, qa3, qb1, qb2, qb3   
1, 3, 1, 2, 4, 4  
1, 3, 2, 2, 1, 4  
2, 3, 1, 2, 1, 4  
1, 3, 2, 1, 1, 3

EDIT df also contains other columns with other irrelevant data!

I have a vector with correct answers to each of qa1-3 and qb1-3 in sequence with the columns.

correct_answer <- c(1,3,2,2,1,4)

(i.e. for qa1,qa2,qa3,qb1,qb2,qb3)

Desired manipulation

I want to create a new column per question (e.g. qa1_correct), coding for whether the participant has responded correctly (1) or incorrectly (0) based on matching each response in df with corresponding answer in correct_answer. Ideally I would end up with:

qa1, qa2, qa3, qb1, qb2, qb3, qa1_correct, qa2_correct, qa3_correct ...     
1, 3, 1, 2, 4, 4, 1, 1, 0, ...   
1, 3, 2, 2, 1, 4, 1, 1, 1, ...   
2, 3, 1, 2, 1, 4, 0, 1, 0, ...   
1, 3, 2, 1, 1, 3, 1, 1, 1, ...

Failed Attempt

This is my attempt for question As only (would repeat for Bs) but it doesn't work (maybe wrong function paste0()?):

index <- c(1:3)  
    

    for (i in index) {
    df <- df %>% mutate(paste0("qa",i,"_correct") = 
                               case_when(paste0("qa"i) == correct_answer[i] ~ 1, 
                                         paste0("qa"i) != correct_answer[i] ~ 0))
    }

Many thanks for any guidance!

Is a solution without `mutate()` an option? – KacZdr Jul 23 '21 at 14:49 — KacZdr, Jul 23 '21 at 14:49

score 2 · Answer 1 · answered Jul 23 '21 at 15:28

You can combine mutate and across.

Code 1: Correct_answer as vector

df  %>%
  mutate(across(everything(),
                ~as.numeric(.x == correct_answer[names(df) == cur_column()]),
                .names = "{.col}_correct"))

Code 2: Correct_answer as data.frame (df_correct)

correct_answer <- c(1,3,2,2,1,4) 
df_correct <- data.frame(
  matrix(correct_answer, ncol = length(correct_answer))
)
colnames(df_correct) <- names(df)

df  %>%
  mutate(across(everything(),
                .fn = ~as.numeric(.x == df_correct[,cur_column()]),
                .names = "{.col}_correct"))

Output

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_correct qa2_correct qa3_correct qb1_correct qb2_correct qb3_correct
1   1   3   1   2   4   4           1           1           0           1           0           1
2   1   3   2   2   1   4           1           1           1           1           1           1
3   2   3   1   2   1   4           0           1           0           1           1           1
4   1   3   2   1   1   3           1           1           1           0           1           0

Thanks! If I had other columns with totally different named variables, could I replace everything() with e.g. select(starts_with("q"))? — Coco Newton, Jul 26 '21 at 13:01
You won't need `select`, just replace `everything()` with `starts_with("q")`. ```df %>% mutate(across(starts_with("qa"), ~as.numeric(.x == correct_answer[names(df) == cur_column()]), .names = "{.col}_correct"))``` — tamtam, Jul 26 '21 at 13:08

Anoushiravan R · Answer 2 · 2021-07-23T16:41:12.050

You can also use the following solution in base R:

cbind(df, 
      do.call(cbind, mapply(function(x, y) as.data.frame({+(x == y)}), 
                            df, correct_answer, SIMPLIFY = FALSE)) |>
        setNames(paste0(names(df), "_corr")))

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_corr qa2_corr qa3_corr qb1_corr qb2_corr qb3_corr
1   1   3   1   2   4   4        1        1        0        1        0        1
2   1   3   2   2   1   4        0        0        0        0        0        0
3   2   3   1   2   1   4        1        0        0        0        0        0
4   1   3   2   1   1   3        1        1        1        0        1        0

Or a potential tidyverse solution could be:

library(tidyr)
library(purrr)

df %>%
  mutate(output = pmap(df, ~ setNames(+(c(...) == correct_answer), 
                                             paste0(names(df), "_corr")))) %>%
  unnest_wider(output)

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_corr qa2_corr qa3_corr qb1_corr qb2_corr qb3_corr
1   1   3   1   2   4   4        1        1        0        1        0        1
2   1   3   2   2   1   4        0        0        0        0        0        0
3   2   3   1   2   1   4        1        0        0        0        0        0
4   1   3   2   1   1   3        1        1        1        0        1        0

thank you very much! how could I adapt for when df contains other column variables aside from qa/qb's? — Coco Newton, Jul 26 '21 at 13:03

score 2 · Answer 3 · answered Jul 23 '21 at 16:06

This may also be an alternative (In R version 4.1.0 onwards that has made apply gain a new argument simplify with default TRUE)

df <- read.table(header = T, text = 'qa1, qa2, qa3, qb1, qb2, qb3   
1, 3, 1, 2, 4, 4  
1, 3, 2, 2, 1, 4  
2, 3, 1, 2, 1, 4  
1, 3, 2, 1, 1, 3', sep = ',')

df
#>   qa1 qa2 qa3 qb1 qb2 qb3
#> 1   1   3   1   2   4   4
#> 2   1   3   2   2   1   4
#> 3   2   3   1   2   1   4
#> 4   1   3   2   1   1   3

correct_answer <- c(1,3,2,2,1,4)

cbind(df, 
      setNames(as.data.frame(t(apply(df, 1, 
                                     \(x) +(x == correct_answer)))), 
               paste0(names(df), '_correct')))
#>   qa1 qa2 qa3 qb1 qb2 qb3 qa1_correct qa2_correct qa3_correct qb1_correct
#> 1   1   3   1   2   4   4           1           1           0           1
#> 2   1   3   2   2   1   4           1           1           1           1
#> 3   2   3   1   2   1   4           0           1           0           1
#> 4   1   3   2   1   1   3           1           1           1           0
#>   qb2_correct qb3_correct
#> 1           0           1
#> 2           1           1
#> 3           1           1
#> 4           1           0

^{Created on 2021-07-23 by the reprex package (v2.0.0)}

Coco Newton · Accepted Answer · 2021-07-23T15:00:47.233

0

EDIT works with addition of sym()
Found a related solution here Paste variable name in mutate (dplyr) but it only pastes 0's

for (i in index) {
df <- df %>% mutate( !!paste0("qa",i,"_correct") :=
case_when(!!sym(paste0("qa",i)) == correct_answer[i] ~ 1,
!!sym(paste0("qa",i)) != correct_answer[i] ~ 0))
}

edited Jul 23 '21 at 15:00

answered Jul 23 '21 at 13:48

Coco Newton

69
7

score 0 · Answer 5 · answered Jul 23 '21 at 13:53

0

Try this:

df_new <- cbind(df, t(apply(df, 1, function(x) as.numeric(x == correct_answer))))

answered Jul 23 '21 at 13:53

Mohanasundaram

2,889
1
8
18

no, that didn't work - just generated blank columns with 0's – Coco Newton Jul 23 '21 at 14:00
@CocoNewton, which R version are you using? – AnilGoyal Jul 23 '21 at 16:20

Using R dplyr::mutate() with a for loop and dynamic variables

5 Answers5

Linked