I have a large R data frame (>1 million records with >1,000 variables) that captures information about patient visits to hospitals. Included in this data frame are 30 different procedure statuses, locations, and codes: ProcedureStatus01, ProcedureLocation01, ProcedureCode01, ProcedureStatus02, ProcedureLocation02, ProcedureCode02, etc. I need to create a new set of 30 variables representing "final" procedure codes based on the following logic:
- Where the procedure status is "Complete" and the procedure location is "Onsite", the original procedure code is kept as the final procedure code; and
- Where the procedure code status is not "Complete" (e.g. "Cancelled") or the procedure location is not "Onsite" (e.g. "Offsite"), the final procedure code variable created is blank.
Creating these final variable values will allow me to search completed, onsite procedure codes for specific codes and create new variables based on those final codes.
The following code (using only 5 records and 3 procedure statuses/locations/codes as an example) produces the results that I am expecting, but applying this simple method to my larger data frame (30 procedures) quickly becomes unmanageable and invites the risk of error from copying and pasting 30 times. Is there a preferred approach to produce the same results in a more efficient way? I have found similar questions regarding how to create a new variable using multiple other variable values, but not for many variable combinations at once.
# Load dplyr Package
library(dplyr)
# Create Sample Data Frame
ProcedureStatus01 <- c("Complete", "Cancelled", "Complete", "Scheduled", "Scheduled")
ProcedureLocation01 <- c("Offsite", "Onsite", "Onsite", "Offsite", "Onsite")
ProcedureCode01 <- c("A123", "A123", "A500", "B296", "C901")
ProcedureStatus02 <- c("Complete", "Complete", "Complete", "Scheduled", "Scheduled")
ProcedureLocation02 <- c("Onsite", "Onsite", "Onsite", "Onsite", "Onsite")
ProcedureCode02 <- c("Z562", "A500", "G164", "V413", "A123")
ProcedureStatus03 <- c("Cancelled", "Complete", "Complete", "Cancelled", "Cancelled")
ProcedureLocation03 <- c("Onsite", "Onsite", "Onsite", "Onsite", "Onsite")
ProcedureCode03 <- c("P524", "W412", "A123", "C901", "Z554")
DataFrame <- data.frame(ProcedureStatus01, ProcedureLocation01, ProcedureCode01, ProcedureStatus02, ProcedureLocation02, ProcedureCode02, ProcedureStatus03, ProcedureLocation03, ProcedureCode03)
# Add Completed Onsite Procedure Code Variables
DataFrame <- DataFrame %>% mutate(ProcedureCodeFinal01 = if_else(ProcedureStatus01 == "Complete" & ProcedureLocation01 == "Onsite", ProcedureCode01, ""))
DataFrame <- DataFrame %>% mutate(ProcedureCodeFinal02 = if_else(ProcedureStatus02 == "Complete" & ProcedureLocation02 == "Onsite", ProcedureCode02, ""))
DataFrame <- DataFrame %>% mutate(ProcedureCodeFinal03 = if_else(ProcedureStatus03 == "Complete" & ProcedureLocation03 == "Onsite", ProcedureCode03, ""))