0

I am dealing with some slightly awkward survey data. The output format from the survey software is:

Respondent     contact1     contact2     question1     question2     question1     question2
   Tim           Bob          Alan           1             0             0             1

Where:

Contact1 and contact2 are slots where the respondent can list individuals they have had contact with.

The following fixed set of questions then loop for each contact, each being a new column. Answers are recorded as simply a 1 or 0.

I wish to reshape this data to a more analytical friendly layout:

Respondent     Contact    question1    question2
   Tim           Bob          1            0
   Tim           Alan         0            1

Obviously this is simplified, the actual dataset has 100 contacts and approx 8 questions each but the layout is the same.

I believe the best approach is to use a combination of gather(), express() and spread()

data %>%
  gather(key, value, -Respondent) %>%
  extract(key, c("question", "contact"), "reg ex") %>%
  spread(question, value)

But the differing lengths and multiple contacts is proving difficult to align.

RAH
  • 395
  • 2
  • 9

1 Answers1

2

Remove numbers from "contact" columns and then you can use pivot_longer.

names(df) <- sub('(?<=contact)\\d+', '', names(df), perl = TRUE)

tidyr::pivot_longer(df, cols = -Respondent, names_to = '.value')

#  Respondent contact question1 question2
#  <chr>      <chr>       <int>     <int>
#1 Tim        Bob             1         0
#2 Tim        Alan            0         1

data

df <- structure(list(Respondent = "Tim", contact1 = "Bob", contact2 = "Alan", 
    question1 = 1L, question2 = 0L, question1 = 0L, question2 = 1L), 
   class = "data.frame", row.names = c(NA, -1L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213