Reshape data frame using a column's unique values as new columns and with missing data

Question

I would like to reshape my old data.frame data from long to wide using two variables as the columns for the new data.frame new.data. Specifically, I want to take the two variables data$assessment and data$question_id:

1) Figure out how many data$question_id are in each data$assessment, so that

2) Each data$question_id represents a column in the new data.frame, and

3) Relabel each data$question_id to indicate the assessment it belongs to (i.e. Assessment1 and Q1 is Assessment1_Q1, Assessment1 and Q3 is Assessment1_Q3).

However, there are two things to consider:

1) The assessments have different numbers of questions

2) Not all questions were filled out by the participant (i.e. missing data)

Here's the general structure of the old data.frame:

> dim(data)
[1] 42106     4
> colnames(data)
[1] "subjectid"      "assessment"     "question_id"    "question_value"
> lapply(data, class)
$subjectid
[1] "integer"

$assessment
[1] "factor"

$question_id
[1] "factor"

$question_value
[1] "factor"
> length(unique(data$subjectid))
[1] 96
> table(data$assessment)

 Assessment1              Assessment2 
        1362                     2102 
 Assessment3              Assessment4 
         966                      864 
 Assessment5              Assessment6 
        1183                     2093 
 Assessment7              Assessment8 
         181                    14208 
 Assessment9             Assessment10
        6734                     2044 
Assessment11             Assessment12
        3129                     2185 
Assessment13             Assessment14
        3962                     1093 
> length(unique(data$question_id))
[1] 431

I want my new data.frame new.data to have rows representing participants (N=96), columns representing the assessment and question (i.e. Assessment1_Q1), and new.data$question_value representing each participant's score on a specific assessment/question. Using dim(new.data) should yield 96 432

It should look something like this

subjectid  Assessment1_Q1  Assessment1_Q2  Assessment1_Q3  Assessment1_Q4    Assessment2_Q1  Assessment2_Q2  Assessment2_Q3  Assessment3_Q1  Assessment3_Q2  Assessment3_Q3  Assessment4_Q1  Assessment4_Q2
        1                               6               7                               5               4               1               2               4               8               6               
        2               5                                               9                               3               1               2               4               8               2               3
        3               3               9               5               4               5               9               2                               3               7               5               5

As you can see, the new data.frame's rows are participants, the columns are Assessments/Questions, and the values are the participants' responses (missing responses are left blank.

Can you show few lines of original dataset so that we can test it with some codes. — akrun, Nov 04 '15 at 11:16
See [here on how to give a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) — Jaap, Nov 04 '15 at 11:18
You can use `dput(droplevels(head(data)))` assuming that you have only 4 columns. — akrun, Nov 04 '15 at 11:19
Try `library(reshape2); dcast(data, subjectid~assessment+question_id, value.var='question_value')` — akrun, Nov 04 '15 at 11:21
@akrun It almost worked. The values are all 1 for some reason. I think it's because `class(data$question_value)` is a factor. — , Nov 04 '15 at 11:28
@user2105555 Try converting to `character` and then to numeric (if needed) ie. `data$question_value <- as.numeric(as.character(data$question_value))` Also you can use the fun.aggregate in `dcast` to get the `mean` or `length` etc.. — akrun, Nov 04 '15 at 11:29
@akrun Didn't work. I even tried `as.numeric()` instead of `as.character()` — , Nov 04 '15 at 11:31
Can you update with few lines of your dataset using `dput` so that I can test it ? along with the expected output for that input — akrun, Nov 04 '15 at 11:33
@akrun Is there is way to convert some values into `as.numeric()` and others into `as.numeric()`? I found that some variables were coded as letters (i.e. gender --> M). — , Nov 04 '15 at 11:35
A column can have only a single class. If there is any character value, it willl get converted to `character` or `factor` (depending upon the `stringsAsFactors=FALSE` or TRUE — akrun, Nov 04 '15 at 11:36
@akrun I updated the `dput` for the first dataset. What sort of expected output do you want? — , Nov 04 '15 at 11:41
Based on that dataset, what do you expect as output? In the 'assessment' column you have a long string, so I am confused. — akrun, Nov 04 '15 at 11:41
@akrun I just updated the original post. For some reason, the new dataset only has 1 as values, not the original numerical values from the original dataset. — , Nov 04 '15 at 11:52

Reshape data frame using a column's unique values as new columns and with missing data

0 Answers0