0

I have a data of patients' operations/procedures (example as shown in the picture below) where one row describes a patient's procedure. There are 2 levels of information,

  1. the first being the operation details, i.e. op_start_dt, priority_operation and asa_status
  2. the second being the procedure details, i.e. proc_desc and proc_table

An operation can have more than 1 procedures. In the example below, patient A has 2 operations (defined by distinct op_start_dt). In his first operation, he had 1 procedure (defined by distinct proc_desc) and in his second, he had 2 procedures.

enter image description here

I would like to convert the data into a wide format, where a patient only has one row, and his information will be arranged operation by operation and within each operation, it will be arrange procedure by procedure, as shown below. So, proc_descxy refers to the proc_desc on xth operation and yth procedure.

enter image description here

Data:

df <- structure(list(patient = c("A", "A", "A"), department = c("GYNAECOLOGY /OBSTETRICS DEPT", 
"GYNAECOLOGY /OBSTETRICS DEPT", "GYNAECOLOGY /OBSTETRICS DEPT"
), op_start_dt = structure(c(1424853000, 1424870700, 1424870700
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), priority_operation = c("Elective", 
"Elective", "Elective"), asa_status = c(2, 3, 3), proc_desc = c("UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY", 
"KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC", 
"HEART, VARIOUS LESIONS, HEART TRANSPLANTATION"), proc_table = c("99", 
"6A", "7C")), row.names = c(NA, 3L), class = "data.frame")

Desired output:

df <- structure(list(patient = "A", department = "GYNAECOLOGY /OBSTETRICS DEPT", 
    no_op = 2, op_start_dt1 = structure(1424853000, class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), no_proc1 = 1, priority_operation1 = "Elective", 
    asa_status1 = 2, proc_desc11 = "UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY", 
    proc_table11 = "99", op_start_dt2 = structure(1424870700, class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), no_of_proc2 = 2, priority_operation2 = "Elective", 
    asa_status2 = 3, proc_desc21 = "KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC", 
    proc_table21 = "6A", proc_desc22 = "HEART, VARIOUS LESIONS, HEART TRANSPLANTATION", 
    proc_table22 = "7C"), row.names = 1L, class = "data.frame")

My attempt: I tried to work this out, but it gets confusing along the way, with pivot_longer then pivot_wideragain.

df %>%
  # Operation-level Information
  group_by(patient) %>%
  mutate(op_nth = dense_rank(op_start_dt),
         no_op = n_distinct(op_start_dt)) %>%

  # Procedure-level Information
  group_by(patient, op_start_dt) %>% 
  mutate(proc_nth = row_number(),
         no_proc = n_distinct(proc_desc)) %>% 
  ungroup() %>% 

  # Make pivoting easier
  mutate_all(as.character) %>% 

  # Pivot Procedure-level Information
  pivot_longer(-c(patient, department, no_op, op_nth, proc_nth)) %>%

  # Remove the indices for "Procedure" for Operation_level Information
  mutate(proc_nth = case_when(!(name %in% c("op_start_dt", "no_proc", "priority_operation", "asa_status")) ~ proc_nth)) %>% 

  # Create the column names
  unite(name, c(name, op_nth, proc_nth), sep = "", na.rm = TRUE) %>% 
  distinct() %>% 

  pivot_wider(names_from = name, values_from = value) 
HNSKD
  • 1,614
  • 2
  • 14
  • 25

1 Answers1

0

Create a unique ID column for each patient and then use pivot_wider.

library(dplyr)

df %>%
  group_by(patient) %>%
  mutate(row = row_number()) %>%
  tidyr::pivot_wider(names_from = row, values_from = op_start_dt:proc_table)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213