4

I'm trying to recode answers using a vector that contains the correct answers. I made a for loop that create a new column (with the coded answer) at each loop using a vector with the possible names for the new columns.

However, it seems that mutate does not receive vectors with names. I've tried some different vectors and some paste0() combinations but nothing seem to work.

Here is my reproduceable code:

library(dplyr)
library(tibble)

correct = c(4, 5, 2, 2, 2, 3, 3, 5, 4, 5, 2, 1, 3, 4, 2, 2, 2, 4, 3, 1, 1, 5, 4, 1, 3, 2)

sub1 = c(3, 5, 1, 5, 4, 3, 2, 5, 4, 3, 4, 4, 4, 1, 5, 1, 4, 3, 3, 4, 3, 2, 4, 2, 3, 4)

df = t(data.frame(sub1))
colnames(df) = paste0("P", 1:26)

new_names = paste0("P", 1:26, "_coded")

for(i in 1:26){


  df = as.tibble(df) %>% 
    mutate(new_names = case_when(.[i] == correct[i] ~ 1, 
                     .[i] != correct[i] ~ 0, 
                     T ~ 9999999))

  print(df) # to know what's going on.

}

Also, I know that .dots can receive names in a vector (I think), but I don't quite understand how to use it with case_when inside mutate().

Others ways to create new columns with the recoded value are welcome also

UPDATE: My expected output would be the original data frame with 26 new columns, P1_COD:P26_COD with possible values 1 (if correct) and 0 (if incorrect).

Something like this (I just created four columns with 1s and 0s as an example).

df %>% 
  mutate(P1_COD = 1,
         P2_COD = 0,
         P3_COD = 1,
         P4_COD = 1)
niklai
  • 376
  • 3
  • 16
  • 1
    Why the extremely wide form? In long form it's simple: `data_frame(correct, sub1, cod = as.integer(correct == sub1))` – alistaire May 24 '17 at 03:32
  • I don't recommend this, but if you want to keep it in wide form, this should work: `df <- cbind(df, setNames(as.data.frame(t(as.numeric(mapply(\`==\`, df, correct)))), nm = paste0(colnames(df), "_COD")))`. – Jake Fisher Mar 06 '18 at 04:03

1 Answers1

2

The data is not in a format that dplyr will handle best. I would suggest restructuring your data to longitudinal format, and then the case_when becomes trivial and no for loop is required.

see other documentation for tidyr regarding data format at tidyverse.org documentation

Here is an example of the "longitudinal" format including your sample data. I also added a couple of other subjects with random answers.

library(tidyverse)
responses <- data_frame(
  subject = rep(1:3, each = 26),
  qNum = rep(1:26, 3),
  response = c(sub1, 
               sample(5, 26, replace = T),
               sample(5, 26, replace = T)))

The answers can be created and then merged:

answers <- data_frame(
  qNum = 1:26,
  answer = correct)
df <- left_join(responses, answers)

Next, score the answers using dplyr::case_when:

df <- df %>% mutate(score = case_when(response == answer ~ 1,
                                TRUE ~ 0))

note: the TRUE ~ 0 may be confusing at first. It tells what to do with the remaining values, if the first condition is FALSE. The resulting df/tibble:

# A tibble: 26 x 5
   subject  qNum response answer score
     <dbl> <int>    <dbl>  <dbl> <dbl>
 1       1     1        3      4     0
 2       1     2        5      5     1
 3       1     3        1      2     0
 4       1     4        5      2     0
 5       1     5        4      2     0
 6       1     6        3      3     1
 7       1     7        2      3     0
 8       1     8        5      5     1
 9       1     9        4      4     1
10       1    10        3      5     0
# ... with 16 more rows

If you want to convert this to "wide" format, use tidyr::spread:

df %>%
  select(-response, -answer) %>% 
  spread(qNum, score, sep = ".")
# A tibble: 3 x 27
  subject qNum.1 qNum.2 qNum.3 qNum.4 qNum.5 qNum.6 qNum.7 qNum.8 qNum.9 qNum.10
*   <int>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
1       1      0      1      0      0      0      1      0      1      1       0
2       2      0      0      0      0      1      0      0      0      0       0
3       3      0      0      0      0      1      0      0      0      0       0
Matt L.
  • 2,753
  • 13
  • 22