0

I am working with some survey responses from Qualtrics and analyzing the data in R. One question, a multiple answer multiple choice question, outputs numeric response choices, separated by commas, into one cell. For example, a person who picked choices 4, 7 and 10 has output that looks like "4,7,10" or "10,4,7", a character vector in R. The choices are in random order, depending on the response, for some reason.

I was able to use the splitstackshape package's "cSplit" command to split all of these values into multiple columns. There are 22 possible choices, so the single column (let's call it IM) was split into 22 different columns each holding one value (e.g. IM_01, IM_02...IM_22).

For the example response I gave above that came out as "10,4,7", IM_01 = 10, IM_02 = 4, IM_03 = 7 and IM_04 through IM_22 are NA. So the problem here is that all the 4's are not in one column, neither are the 7's or any other value. They are in columns in the order they appeared initially in comma-separated form.

Here is a simplified, small df as an example of what I am dealing with. There are seven possible choices in this example.

exampledf <- data.frame(ID = 1:3, Response = c("4,7,10", "7,5,16,8", "2,10"), 
stringsAsFactors = FALSE)

  ID Response
1  1   4,7,10
2  2 7,5,16,8
3  3     2,10

A good way to sort them, I imagine, would be to make one column for each possible choice and set a cell in that column to 1 if it corresponds with one of the choices in that row. The intended outcome would look something like this:

  ID Response IM2   IM4 ...   IM10  IM16
1  1   4,7,10  NA     1          1    NA
2  2 7,5,16,8  NA    NA         NA     1
3  3     2,10   1    NA          1    NA

Now I did find a way to do this for one column with the following code:

exampledf$IM4 <- NA

within(exampledf, IM4[IM_02 == 4 | IM_04 == 4  | IM_05 == 4
                     | IM_07 == 4 | IM_08 == 4 | IM_10 == 4
                     | IM_16 == 4 <- 1)

But I can't find a way to do this for all columns at once without copying and pasting the code over and over and changing the logical statements to equal the relevant choice for each copied block. I also tried turning this into a function...

assignment <- function(cat, n) {
within(exampledf, cat[IM_02 == n | IM_04 == n  | IM_05 == n
                     | IM_07 == n | IM_08 == n | IM_10 == n
                     | IM_16 == n <- 1)

...but I can't figure out how to successfully pass the two arguments (category and category number) to the function.

Any thoughts on how to accomplish this, either using the function I started or a different way entirely?

Thanks a lot!

Donovan192
  • 658
  • 1
  • 8
  • 13
  • A second dup (http://stackoverflow.com/questions/16267552/dummy-variables-from-a-string-variable) – HubertL Sep 15 '16 at 00:36

1 Answers1

1

Split the response vector on commas:

exampledf$split_responses <- sapply(exampledf$Response,function(x) as.numeric(unlist(strsplit(x,','))))

Compose each ID-response vector pair into individual dataframes, and concatenate them row-wise:

xx = do.call(rbind,apply(exampledf,1,function(x) data.frame(x$ID, x$split_responses)))

Add a column for what value you'd like the columns to take:

xx$value = 1

and use tidyr to reshape into the desired shape:

library(tidyr)
spread(xx,key=x.split_responses,value=value)

  x.ID  2  4  5  7  8 10 16
1    1 NA  1 NA  1 NA  1 NA
2    2 NA NA  1  1  1 NA  1
3    3  1 NA NA NA NA  1 NA
Patrick McCarthy
  • 2,478
  • 2
  • 24
  • 40