0

I have a dataframe (d1) with respondents answers to a series of questions (Q1-Q12), such that the questions are the column names and a respondents answers are 1 row across all the columns, the next respondents answers are row 2 across the columns. Another dataframe (d2) has a questions column with Q1-Q12 as the rows and a Correct_answers column with the correct answers to questions Q1-Q12

My question is how to compare the respondents answers in d1 with the correct answers in d2, changing the values in d1 to 1 if the respondent answered correctly and 0 if the respondent answered incorrectly.

Thanks

joey5
  • 139
  • 1
  • 10
  • 1
    Please provide a small example dataframe(s) and the expected output. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – coffeinjunky May 02 '17 at 22:30

1 Answers1

0

Here is a way to work that out using functions from dplyr and tidyr, specifically gather and spread.

library(dplyr)
library(tidyr)

df1 <- tribble(
    ~respondent, ~Q1, ~Q2, ~Q3, ~Q4,
              1, "A", "B", "A", "C",
              2, "B", "B", "B", "C",
              3, "A", "C", "B", "C",
              4, "A", "B", "B", "A"
)

df2 <- tribble(
    ~question, ~correct,
    "Q1",       "A",
    "Q2",       "B",
    "Q3",       "B",
    "Q4",       "C"
)


df1 %>% 
    gather(question, answer, -respondent) %>%
    left_join(df2) %>%
    mutate(compare = ifelse(answer == correct, 1, 0)) %>%
    select(-answer, -correct) %>%
    spread(question, compare)

#> Joining, by = "question"
#> # A tibble: 4 × 5
#>   respondent    Q1    Q2    Q3    Q4
#> *      <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1          1     1     1     0     1
#> 2          2     0     1     1     1
#> 3          3     1     0     1     1
#> 4          4     1     1     1     0
Julia Silge
  • 10,848
  • 2
  • 40
  • 48