How to convert comma-separated multiple responses into dummy coded columns in R

Question

In a survey, there was a question that asked "what aspect of the course helped you learn concepts the most? Select all that apply"

Here is what the list of responses looked like:

Student_ID = c(1,2,3)
Responses = c("lectures,tutorials","tutorials,assignments,lectures", "assignments,presentations,tutorials")
Grades = c(1.1,1.2,1.3)
Data = data.frame(Student_ID,Responses,Grades);Data

Student_ID | Responses                           | Grades
1          | lectures,tutorials                  | 1.1
2          | tutorials,assignments,lectures      | 1.2
3          | assignments,presentations,tutorials | 1.3

Now I want to create a data frame that looks something like this

Student_ID | Lectures | Tutorials | Assignments | Presentation | Grades
1          |     1    |     1     |      0      |       0      |  1.3
2          |     1    |     1     |      1      |       0      |  1.4
3          |     0    |     1     |      1      |       1      |  1.3

I managed to separate the comma separated responses into columns, using the splitstackshape package. So currently my data looks like this:

Student ID | Response 1 | Response 2  | Response 3 | Response 4 | Grades
1          | lectures   | tutorials   |    NA      |     NA     |   1.1
2          | tutorials  | assignments | lectures   |     NA     |   1.2
3          | assignments| presentation| tutorials  |     NA     |   1.3

But as I stated earlier, I would like my table to look like the way I presented above, in dummy codes. I am stuck on how to proceed. Perhaps an idea is to go through each observation in the columns and append 1 or 0 to a new data frame with lectures,tutorials,assignments,presentation as the headers?

Is that how your data looks, with vertical bars and everything on one line? — AkselA, May 22 '19 at 20:27
Hi, no it doesn't, i'm sorry. I'm new to stack exchange and I thought it would look like a proper data frame but it ended up looking like that. Working on fixing it now. Should be done in a bit — Saneea Mustafa, May 22 '19 at 20:33
Try shooting your expected results in a .csv and then pulling it in as a DF and printing the head and updating your post with the output in R. Would help visualize it better :) — OctoCatKnows, May 22 '19 at 20:36
@BuffsGrad16 Thank you for your input! I fixed it before I saw your comment but that would have been of great help too :) — Saneea Mustafa, May 22 '19 at 20:41
So you've already read in the data using `read.csv()`, those vertical bars are just your formatting? — AkselA, May 22 '19 at 20:43
Yep, just to help visualise it here and keep it simple. The actual dataset is bigger than these few variables — Saneea Mustafa, May 22 '19 at 20:44
Possible duplicate of [Split data frame string column into multiple columns](https://stackoverflow.com/questions/4350440/split-data-frame-string-column-into-multiple-columns) — divibisan, May 23 '19 at 18:50

score 5 · Accepted Answer · edited Jan 20 '20 at 21:54

First the Response column is converted from factor to character class. Each element of that column is then split on comma. I don't know what all the possible responses are, so I used all that are present. Next the split Response column is tabulated, specifying the possible levels. The resulting list is converted into a matrix before being mixed into the old data.frame.

Data$Responses <- as.character(Data$Responses)
resp.split <- strsplit(Data$Responses, ",")

lev <- unique(unlist(resp.split))

resp.dummy <- lapply(resp.split, function(x) table(factor(x, levels=lev)))

Data2 <- with(Data, data.frame(Student_ID, do.call(rbind, resp.dummy), Grades))
Data2
#   Student_ID lectures tutorials assignments presentations Grades
# 1          1        1         1           0             0    1.1
# 2          2        1         1           1             0    1.2
# 3          3        0         1           1             1    1.3

thank you for your time and effort, i really appreciate it. This was also very helpful and I shall be trying this out with my code too. Thank you very much — Saneea Mustafa, May 22 '19 at 21:06
Neat! To avoid the `do.call(rbind(.))` try `sapply`: `with(Data, data.frame(Student_ID, t(sapply(resp.split, function(x) table(factor(x, levels=lev))))))`. — jay.sf, Aug 10 '20 at 06:24

score 1 · Answer 2 · answered May 22 '19 at 21:04

1

I found a response to my question. I initially did

library(splitstackshape)
Responses = cSplit(Data, "Responses",",")

Then I added the following line:

library(qdapTools)
TA <- mtabulate(as.data.frame(t(TA)))

It worked for me.

answered May 22 '19 at 21:04

Saneea Mustafa

77
6

How to convert comma-separated multiple responses into dummy coded columns in R

2 Answers2