1

I am a beginner, confronted with a big task and all the typical long to wide reshaping tools I found using the search function did not really do the job for me. I would be glad if someone could help me.

I try to achieve the following:

I have patientdata in which every patient has a unique patient number but multiple stays in hospital have lead to multiple cases per person. I want to work with these cases. Problem is, I have all the diagnoses per case but not everybody has the same number of diagnosis and I don't know how to tell R to create a new dagnosis (and date of diagnosis) variable each time there is already a diagnosis. Every help is highly appreciated!

So, I have a huge dataset that looks roughly like that:

        Patient Case Diagnosis DateOfDiagnosis
1      John Doe    1         A      2010-10-10
2      John Doe    1         B      2010-10-10
3      John Doe    1         C      2010-10-10
4 Peter Griffin    2         D      2010-10-11
5 Peter Griffin    2         E      2010-10-11
6 Homer Simpson    3         F      2010-10-12
7 Homer Simpson    4         G      2010-10-13

I need row by case and I need all the diagnosis and their dates in separate variables. This would be no problem but there is no pattern in the cases or diagnosis so some patients have only one case others 5 and some cases have 1 others 5 diagnoses with respective date. So what I need looks like this:

        Patient Case Diag1 DateOfDiag1 Diag2 DateOfDiag2 Diag3 DateOfDiag3 ....
1      John Doe    1   A   2010-10-10    B   2010-10-10    C   2010-10-10
2      Peter Grif  2   D   2010-10-11    E   2010-10-11    NA        NA 
3      Homer Simp  3   F   2010-10-12    NA     NA         NA        NA
4      Homer Simp  4   G   2010-10-13    NA     NA         NA        NA

The code for my example is:

Patient <- c('John Doe','John Doe','John Doe', 'Peter Griffin','Peter Griffin', 'Homer Simpson', 'Homer Simpson')
Case <- c(1,1,1,2,2,3,4)
Diagnosis <- c('A','B','C','D','E','F','G')
DateOfDiagnosis <- as.Date(c('2010-10-10','2010-10-10','2010-10-10','2010-10-11','2010-10-11','2010-10-12','2010-10-13')) 


df<-data.frame(Patient, Case, Diagnosis, DateOfDiagnosis)

Every help is highly appreciated! Kind regards, Jan

1 Answers1

1

You could use pivot_wider, after creating a unique column.

library(dplyr)
library(tidyr)

df %>%
  group_by(Patient, Case) %>%
  mutate(row = row_number()) %>%
  pivot_wider(values_from =  c(Diagnosis, DateOfDiagnosis), names_from = row)

#   Patient        Case Diagnosis_1 Diagnosis_2 Diagnosis_3 DateOfDiagnosis_1 DateOfDiagnosis_2 DateOfDiagnosis_3
#  <fct>         <dbl> <fct>       <fct>       <fct>       <date>            <date>            <date>           
#1 John Doe          1 A           B           C           2010-10-10        2010-10-10        2010-10-10       
#2 Peter Griffin     2 D           E           NA          2010-10-11        2010-10-11        NA               
#3 Homer Simpson     3 F           NA          NA          2010-10-12        NA                NA               
#4 Homer Simpson     4 G           NA          NA          2010-10-13        NA                NA     
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you to both of you! The combination just solved my problem as far as I understand by now. – Jan Brederecke Dec 14 '19 at 10:42
  • It was of great help but I have a follow up question: Say I have a second dataset with diagnoses and the first is not complete but the best source of information, how could I fill in NA in the dataset with diagnoses and dates of diagnosis from my second dataset? The second dataset might also have more and sometimes different diagnoses which should be included in the final dataset. Do you have an idea on how to merge the two datasets in this way? Thank you for your great advice so far! – Jan Brederecke Dec 15 '19 at 18:22
  • Please ask a new question and include all these details in it. – Ronak Shah Dec 15 '19 at 22:51