This question is very similar to one I had previously asked here: R, find character string from vector, create new TRUE/FALSE columns and that solution worked perfectly, but now I have an added twist.
df<-structure(list(Date = c("5/20/2019", "5/20/2019", "5/20/2019",
"5/20/2019", "5/20/2019", "5/20/2019", "5/20/2019", "5/20/2019",
"10/22/2018", "5/20/2019"), ESRD_1 = c("CKD (chronic kidney disease), stage III [N18.30]; CKD (chronic kidney disease), stage III [N18.30]; Type 2 diabetes mellitus [E11.9]",
NA, NA, NA, NA, NA, NA, NA, NA, NA), ESRD_10 = c(NA, NA, NA,
NA, NA, NA, NA, NA, "End stage renal disease on dialysis [N18.6, Z99.2]; End stage renal disease on dialysis [N18.6, Z99.2]; Type 2 diabetes mellitus with stage 4 chronic kidney disease, with long-term current use of insulin [E11.22, N18.4, Z79.4]; Type 2 diabetes mellitus with stage 4 chronic kidney disease, with long-term current use of insulin [E11.22, N18.4, Z79.4]; Type 2 diabetes mellitus with stage 4 chronic kidney disease, with long-term current use of insulin [E11.22, N18.4, Z79.4]",
NA), ESRD_11 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, "ESRD (end stage renal disease) on dialysis [N18.6, Z99.2]; ESRD (end stage renal disease) on dialysis [N18.6, Z99.2]; Peripheral vascular disease due to secondary diabetes [E13.51]; Type 2 diabetes mellitus with chronic kidney disease on chronic dialysis, with long-term current use of insulin [E11.22, N18.6, Z99.2, Z79.4]; Type 2 diabetes mellitus with chronic kidney disease on chronic dialysis, with long-term current use of insulin [E11.22, N18.6, Z99.2, Z79.4]; Type 2 diabetes mellitus with chronic kidney disease on chronic dialysis, with long-term current use of insulin [E11.22, N18.6, Z99.2, Z79.4]"
), ESRD_12 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ESRD_13 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), ESRD_2 = c(NA, "CKD (chronic kidney disease), stage III [N18.30]; CKD (chronic kidney disease), stage III [N18.30]; Diabetic gastroparesis [E11.43, K31.84]; Type 2 diabetes mellitus [E11.9]",
NA, NA, NA, NA, NA, NA, NA, NA), ESRD_3 = c(NA, NA, "CKD (chronic kidney disease), stage III [N18.30]; CKD (chronic kidney disease), stage III [N18.30]; Type 2 diabetes mellitus [E11.9]",
NA, NA, NA, NA, NA, NA, NA), ESRD_4 = c(NA, NA, NA, "CKD (chronic kidney disease), stage III [N18.30]; CKD (chronic kidney disease), stage III [N18.30];",
NA, NA, NA, NA, NA, NA), ESRD_5 = c(NA, NA, NA, NA, "CKD (chronic kidney disease), stage II [N18.20]; CKD (chronic kidney disease), stage II [N18.20]; Type 2 diabetes mellitus [E11.9]",
NA, NA, NA, NA, NA), ESRD_6 = c(NA, NA, NA, NA, NA, "CKD (chronic kidney disease), stage III [N18.30]; CKD (chronic kidney disease), stage III [N18.30]; Type 2 diabetes mellitus [E11.9]",
NA, NA, NA, NA), ESRD_7 = c(NA, NA, NA, NA, NA, NA, "CKD (chronic kidney disease), stage III [N18.30]; CKD (chronic kidney disease), stage III [N18.30]; Diabetic gastroparesis [E11.43, K31.84]; Type 2 diabetes mellitus [E11.9]",
NA, NA, NA), ESRD_8 = c(NA, NA, NA, NA, NA, NA, NA, "CKD (chronic kidney disease), stage II [N18.20]; CKD (chronic kidney disease), stage II [N18.20];",
NA, NA), ESRD_9 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), spec = structure(list(
cols = list(Date = structure(list(), class = c("collector_character",
"collector")), ESRD_1 = structure(list(), class = c("collector_character",
"collector")), ESRD_10 = structure(list(), class = c("collector_character",
"collector")), ESRD_11 = structure(list(), class = c("collector_character",
"collector")), ESRD_12 = structure(list(), class = c("collector_logical",
"collector")), ESRD_13 = structure(list(), class = c("collector_logical",
"collector")), ESRD_2 = structure(list(), class = c("collector_character",
"collector")), ESRD_3 = structure(list(), class = c("collector_character",
"collector")), ESRD_4 = structure(list(), class = c("collector_character",
"collector")), ESRD_5 = structure(list(), class = c("collector_character",
"collector")), ESRD_6 = structure(list(), class = c("collector_character",
"collector")), ESRD_7 = structure(list(), class = c("collector_character",
"collector")), ESRD_8 = structure(list(), class = c("collector_character",
"collector")), ESRD_9 = structure(list(), class = c("collector_logical",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
I want to look through those "ESRD" columns and look for specific patterns, creating TRUE/FALSE columns of whether specific patterns are found. For instance R would look through the ESRD columns for either the word "diabetes" or the code "E11.9" and it would create a true/false diabetes column based on what it found.
I could do that, the code would be very similar to what was posted in that other question.
My question now is: for one of those particular columns I am creating now, it wouldn't just be true/false. I'd like to create a column that was labeled "CKD" that has possible values of: "No" (if a code was not found), "Stage 1", "Stage 2", "stage 3", "Stage 4", "Stage 5", "End Stage Renal Disease" and "Unspecified" (a specific code for unspecified). Following these rules:
So my end result would look like this: