0

I have extracted some data from a non-normalised ms access database (from an old system and no data querying has been performed before). And since there are multiple duplicated foreign keys in a table the data looks like the below table when performing a join.

--------------------------------------------------------------------------------------------------------------------------
|PtID|ID. NO  |Name     |DOB        |Sex|AdmDate        |FK|Diagnosis               |DiagCode|
|------------------------------------------------------------------------------------------------------------------------|
|8989|0099999G|John Smith|11/28/1930|M  |3/6/2018 11:22 |8989|Atrial fibrillation and flutter   |I48     |
|8989|0099999G|John Smith|11/28/1930|M  |3/6/2018 11:22 |8989|Bacterial pneumonia, unspecified  |J15.9   |
|8989|0099999G|John Smith|11/28/1930|M  |3/6/2018 11:22 |8989|Intracardiac thrombosis               |I51.3   |
--------------------------------------------------------------------------------------------------------------------------

I attempted to unite() the diagnosis and diagnosis code which does work. However the issue when performing the function below is with the spread function.

TRY <- MGW %>% 
  gather(Diagnosis, DiagCode, -(PtID:DiagCode)) %>%
  group_by(PtID) %>%
  unite(temp, Diagnosis, DiagCode) %>%
  mutate(rn = paste0("Diagnosis",row_number())) %>%
  spread(rn, temp)

For some reason the diagnosis2 is being populated in the below row therefore the duplicates are not being removed

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|PtID|ID. NO  |Name     |DOB        |Sex|AdmDate        |FK|Diagnosis1           |Diagnosis2                             |Diagnosis3              |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|8989|0099999G|John Smith|11/28/1930|M  |3/6/2018 11:22 |8989|Atrial fibrillation and flutter_I48|                        Intracardiac thrombosis_I51.3|
|8989|0099999G|John Smith|11/28/1930|M  |3/6/2018 11:22                       Bacterial pneumonia, unspecified_J15.9|                 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I would appreciate any help, thanks in advance

EP89
  • 5
  • 3
  • I hope John Smith is not John Smith :-) – Andre Elrico Mar 14 '18 at 12:23
  • yes there are several functions out there. I recommend to use `tidyr`'s `gather` and `spread` functions. – Roman Mar 14 '18 at 12:30
  • I've tried the code you included in your updated question. It gives the desired result for me. Did you update your packages? Or did you try any of the other solutions posted in the duplicate link? – Jaap Mar 16 '18 at 13:50
  • @Jaap i did try the other solutions but still did not work... I will update to the latest versions and give it a try. thx – EP89 Mar 20 '18 at 06:43

0 Answers0