0

I am trying to clean this dataset and the original data is formatted terribly. This is the format that the original data came in:

PATIENT_CODE     PATIENT_NAME     TEST           TEST_RESULT
1                JOHN SMITH       URIC ACID      5
1                JOHN SMITH       GLUCOSE        6

This is the format I would like to use:

PATIENT_CODE     PATIENT_NAME    URIC ACID     GLUCOSE
1                JOHN SMITH      5             6

There are 10+ tests ("TEST" variable) for each patient, and not all patients received all of the tests. How do I "unmelt" this data to get the desired dataset?

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Carmen
  • 11
  • 3
    `library(tidyverse); df1 %>% spread(TEST, TEST_RESULT)` – Mako212 Jan 28 '19 at 19:36
  • Or with `reshape2`: `dcast(df1, PATIENT_CODE + PATIENT_NAME ~ TEST, value.var = "TEST_RESULT")` – Mako212 Jan 28 '19 at 19:38
  • 3
    Possible duplicate of [R Reshape data frame from long to wide format?](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – HFBrowning Jan 28 '19 at 19:39
  • @Mako212 Here is my line of code: df2<-spread(df1, TEST, TEST_RESULT) but when I do this I get the error code "Duplicate identifiers for rows" and the results do not spread out. Rather, the number "2" makes a diagonal line across the spreaded TEST. What am I doing incorrectly? – Carmen Jan 28 '19 at 21:27

1 Answers1

0

This is a dcast()

library(reshape2)

> dcast(df1, ... ~ TEST, value.var="TEST_RESULT")
  PATIENT_CODE PATIENT_NAME GLUCOSE URIC ACID
1            1  JOHN SMITH        6         5

Data

df1 <- structure(list(PATIENT_CODE = c(1L, 1L), PATIENT_NAME = structure(c(1L, 
1L), .Label = "JOHN SMITH ", class = "factor"), TEST = structure(2:1, .Label = c("GLUCOSE", 
"URIC ACID"), class = "factor"), TEST_RESULT = 5:6), row.names = c(NA, 
-2L), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110