As part of piloting a survey, I presented each Turker with sets of choices amongst four alternatives. The data looks like this:
> so
WorkerId pio_1_1 pio_1_2 pio_1_3 pio_1_4 pio_2_1 pio_2_2 pio_2_3 pio_2_4
1 1 Yes No No No No No Yes No
2 2 No Yes No No Yes No Yes No
3 3 Yes Yes No No Yes No Yes No
I'd like it to look like this:
WorkerId set pio1 pio2 pio3 pio4
1 1 Yes No No No
1 2 No No Yes No
...
I can kludge through this by a number of means, none of which seem very elegant:
- Swapping the order of the numbers with regexes and backreferencing and then using reshape()
- Writing my own little function to parse out the first digit between the underscores and then reshape it long
- Splitting and then stacking the columns (relies on the ordering being right)
But it seems to me that all of these ignore the idea that data in what you might call "double wide" format has its own structure. I'd love to use the reshape2 package for this, but despite the data having been produced with cast() I don't see any options that would help me truly melt this data.frame back.
Suggestions welcome.
so <- structure(list(WorkerId = 1:3, pio_1_1 = structure(c(2L, 1L,
2L), .Label = c("No", "Yes"), class = "factor"), pio_1_2 = structure(c(1L,
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_1_3 = structure(c(1L,
1L, 1L), .Label = c("No", "Yes"), class = "factor"), pio_1_4 = structure(c(1L,
1L, 1L), .Label = "No", class = "factor"), pio_2_1 = structure(c(1L,
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_2_2 = structure(c(1L,
1L, 1L), .Label = c("No", "Yes"), class = "factor"), pio_2_3 = structure(c(2L,
2L, 2L), .Label = c("No", "Yes"), class = "factor"), pio_2_4 = structure(c(1L,
1L, 1L), .Label = "No", class = "factor")), .Names = c("WorkerId",
"pio_1_1", "pio_1_2", "pio_1_3", "pio_1_4", "pio_2_1", "pio_2_2",
"pio_2_3", "pio_2_4"), row.names = c(NA, 3L), class = "data.frame")