0

I have a dataframe with 3 column: participant ID, questionID and a column containing wether or not they gave the correct (1) response or not (0). It looks like this:

> head(df)
# A tibble: 6 x 3
     ID questionID correct
  <dbl>      <int>   <dbl>
1     1          1       1
2     2          2       0
3     3          3       1
4     4          4       0
5     5          5       0
6     6          6       1

And can be recreated using:

set.seed(0)
df <- tibble(ID = seq(1, 100, 1),
             questionID = rep(seq(1, 10,), 10),
             correct = base::sample(c(0, 1), size = 100, replace = TRUE))

Now I would like each question to have their own column, with the ultimate goal of fitting a 2PL model to it. The data should for that purpose look like 1 row per participant, and 11 columns (ID and 10 question Columns).
How do I achieve this?

JNab
  • 135
  • 10

2 Answers2

1

You can use pivot_wider from the tidyr package:

df %>%
  pivot_wider(names_from = questionID,
              values_from = correct,
              names_prefix = "questionID_")

# A tibble: 100 x 11
      ID questionID_1 questionID_2 questionID_3 questionID_4
   <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
 1     1            1           NA           NA           NA
 2     2           NA            0           NA           NA
 3     3           NA           NA            0           NA
 4     4           NA           NA           NA            1
 5     5           NA           NA           NA           NA
 6     6           NA           NA           NA           NA
 7     7           NA           NA           NA           NA
 8     8           NA           NA           NA           NA
 9     9           NA           NA           NA           NA
10    10           NA           NA           NA           NA
# ... with 90 more rows, and 6 more variables: questionID_5 <dbl>,
#   questionID_6 <dbl>, questionID_7 <dbl>, questionID_8 <dbl>,
#   questionID_9 <dbl>, questionID_10 <dbl>
Aron Strandberg
  • 3,040
  • 9
  • 15
0

Using data.table you can use dcast

df <- data.frame(ID=c(1,2,3,4,5,6), questionID= c(1,22,13,4,35,8),correct=c(1,0,1,0,0,1))
 df
 ID questionID correct
1  1          1       1
2  2         22       0
3  3         13       1
4  4          4       0
5  5         35       0
6  6          8       1

setDT(df)
dcast(df,ID~questionID,value.var="correct")
   ID  1  4  8 13 22 35
1:  1  1 NA NA NA NA NA
2:  2 NA NA NA NA  0 NA
3:  3 NA NA NA  1 NA NA
4:  4 NA  0 NA NA NA NA
5:  5 NA NA NA NA NA  0
6:  6 NA NA  1 NA NA NA

# replace NA to what you want
df[is.na(df)]<- "-"
fra
  • 832
  • 6
  • 14