Following my earlier question: R: reshape/gather function to create dataset ready for multilevel analysis
I discovered it is a bit more complicated. My dataset is actually 'messier' than I hoped. So here's the full story: I have a big dataset, 240 cases. Each row is a case (breast cancer patient). Somewhere at the end of the dataset(say from column 417 onwards) I have partner data of the patients, that also filled in a questionnaire. In the beginning, there are demographic variables for both patients and partners, followed by test outcomes only of patients, thus followed by partner data.
I want to create a dataset, where I 'split' the patient and partner data, but keep it coupled. Thus: I want to duplicate the subject ID and create new column with 1s and 2s (1 corresponding to patient and 2 to partner). Then, I want my data actually as it is now, but some variables can be matched though (for example, I know have "date of birth" for patient [pgebdat] and for partner [prgebdat] separate. Ofcourse, I can turn this into 'gebdat' with the two birth dates below each other.
This code worked for me for a small subset of my data:
mydf_long <- mydf4 %>%
unite(bb1:bb50rec, col = `1`, sep = ";") %>% # Combine responses of 'p1' through 'p3'
unite(pbb1:pbb50recM, col = `2`, sep = ";") %>% # Combine responses of 'pr1' through 'pr3'
gather(couple, value, `1`:`2`) %>% # Form into long data
separate(value, sep = ";", into = c(paste0("bb", seq(1:104),"", sep = ','))) %>% # Separate and retrieve original answers
arrange(id)
results in:
id groep_MNC zkhs fbeh pgebdat couple bb1,
1 3 1 1 1 1955-12-01 1 4
2 3 1 1 1 1955-12-01 2 5
3 5 1 1 1 1943-04-09 1 2
4 5 1 1 1 1943-04-09 2 2
But now it copies and pastes the date of birth of the patient also to 'partner' row.
I'm stuck, and don't even quite know what data you would need to be able to answer my question, so please do ask. I'll provide something of an example below:
Example of data
id groep_MNC zkhs fbeh pgebdat p_age pgesl prgebdat pr_age prgesl relpnst
1 3 1 1 1 1955-12-01 42.50000 1 <NA> NA 2 1
2 5 1 1 1 1943-04-09 55.16667 1 1962-04-18 36.50000 1 2
3 7 1 1 1 1958-04-10 40.25000 1 <NA> NA 2 1
4 10 1 1 1 1958-04-17 40.25000 1 1957-07-31 41.33333 2 1
5 12 1 1 2 1947-11-01 50.66667 1 1944-06-08 54.58333 2 1
And then, after couple of hundred variables for only patients, this partner data comes along:
pbb1 pbb2 pbb3 pbb4 pbb5 pbb6 pbb7 pbb8 pbb9
1 5 5 5 5 2 5 4 2 3
2 2 1 4 1 3 4 3 3 4
3 5 3 4 4 4 3 5 3 4
4 5 3 5 5 5 5 4 4 4
5 5 5 5 5 5 4 4 3 4
note, I didn't create this dataset myself - I'm just here to tidy up the mess :)
Edit: The dataset is in dutch. Pgesl = gender for patient, prgesl = gender for partner... etc.