I would like to reshape my old data.frame data
from long to wide using two variables as the columns for the new data.frame new.data
. Specifically, I want to take the two variables data$assessment
and data$question_id
:
1) Figure out how many data$question_id
are in each data$assessment
, so that
2) Each data$question_id
represents a column in the new data.frame, and
3) Relabel each data$question_id
to indicate the assessment it belongs to (i.e. Assessment1 and Q1 is Assessment1_Q1, Assessment1 and Q3 is Assessment1_Q3).
However, there are two things to consider:
1) The assessments have different numbers of questions
2) Not all questions were filled out by the participant (i.e. missing data)
Here's the general structure of the old data.frame:
> dim(data)
[1] 42106 4
> colnames(data)
[1] "subjectid" "assessment" "question_id" "question_value"
> lapply(data, class)
$subjectid
[1] "integer"
$assessment
[1] "factor"
$question_id
[1] "factor"
$question_value
[1] "factor"
> length(unique(data$subjectid))
[1] 96
> table(data$assessment)
Assessment1 Assessment2
1362 2102
Assessment3 Assessment4
966 864
Assessment5 Assessment6
1183 2093
Assessment7 Assessment8
181 14208
Assessment9 Assessment10
6734 2044
Assessment11 Assessment12
3129 2185
Assessment13 Assessment14
3962 1093
> length(unique(data$question_id))
[1] 431
I want my new data.frame new.data
to have rows representing participants (N=96), columns representing the assessment and question (i.e. Assessment1_Q1), and new.data$question_value
representing each participant's score on a specific assessment/question. Using dim(new.data)
should yield 96 432
It should look something like this
subjectid Assessment1_Q1 Assessment1_Q2 Assessment1_Q3 Assessment1_Q4 Assessment2_Q1 Assessment2_Q2 Assessment2_Q3 Assessment3_Q1 Assessment3_Q2 Assessment3_Q3 Assessment4_Q1 Assessment4_Q2
1 6 7 5 4 1 2 4 8 6
2 5 9 3 1 2 4 8 2 3
3 3 9 5 4 5 9 2 3 7 5 5
As you can see, the new data.frame's rows are participants, the columns are Assessments/Questions, and the values are the participants' responses (missing responses are left blank.