I am dealing with some slightly awkward survey data. The output format from the survey software is:
Respondent contact1 contact2 question1 question2 question1 question2
Tim Bob Alan 1 0 0 1
Where:
Contact1 and contact2 are slots where the respondent can list individuals they have had contact with.
The following fixed set of questions then loop for each contact, each being a new column. Answers are recorded as simply a 1 or 0.
I wish to reshape this data to a more analytical friendly layout:
Respondent Contact question1 question2
Tim Bob 1 0
Tim Alan 0 1
Obviously this is simplified, the actual dataset has 100 contacts and approx 8 questions each but the layout is the same.
I believe the best approach is to use a combination of gather(), express() and spread()
data %>%
gather(key, value, -Respondent) %>%
extract(key, c("question", "contact"), "reg ex") %>%
spread(question, value)
But the differing lengths and multiple contacts is proving difficult to align.