Edited to add code:
I am trying to replicate some work from a colleague that uses SAS. We're having an issue with the import in SAS which converts text (which matches boolean) to numeric.
The purpose of this work is to identify particular records to pass on, so we need the values to be preserved as originally imported (something I think R will be able to do). Right now we're fixing the issue manually because it's a small number of records but that may not always be true.
Where I'm hitting a snag is that I need to replicate their matrix array in R. There are multiple conditions that should be flagged with a 1 if they meet the condition, as follows: SAS Code
I need to be able to evaluate if there is one of 34 potential strings (or partial strings (in SAS, the colon shortens a comparison value to the same length as the evaluation value and compares them) in one of 12 columns (e.g. :Q16 means the string only need start with Q16). Additionally, any one of the 12 could have a value through it does get sparser in later fields.
I am trying to find the most efficient and compact approach, if possible.
I'm still somewhat new at R for more complex problems so I am stymied. I've tried a few approaches with grep and grepl but none have born any fruit. When I tried regex, I tried using each string individually in ifelse and then I also tried one larger string with the "|" operator but no luck either. I also tried base (apply) and dplyr approaches.
Any help is appreciated.
The structure of the data is: Example Table
Code for Example Data:
structure(list(record = 1:20,
icd1 = c("Q753", "Q620", "Q825", "Q211", "Q828", "Q6532", "Q673", "Q380", "Q5310", "Q040", "Q107", "Q6689", "Q860", "Q753", "Q000", "Q673", "Q860", "Q673", "H9190", "Q381"),
icd2 = c("Q141",NA,NA, "Q170", NA, NA, NA, NA, NA, NA, NA, "Q211", NA, NA, "Q211", "Q673", NA, "115", "Q759", "Q753"),
icd3 = c("Q579", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Q038", "H4657", "Q211"),
icd4 = c("Q656", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Q999", NA, NA),
icd5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Q5301", NA, NA),
icd6 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Q168", NA, NA),
icd7 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
icd8 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
icd9 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
icd10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
icd11 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
icd12 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)),
.Names = c("record", "icd1", "icd2", "icd3", "icd4", "icd5", "icd6", "icd7", "icd8", "icd9", "icd10", "icd11", "icd12"),
class = "data.frame", row.names = c(NA, -20L))
Strings of Interest:
case2 <- "^H4703| ^H90*| ^H91*| ^Q000| ^Q001| ^Q002| ^Q01*| ^Q02| ^Q03*|
^Q04*| ^Q05*| ^Q070*| ^Q110| ^Q111| ^Q112| ^Q120| ^Q122| ^Q130| ^Q138|
^Q139| ^Q141| ^Q142| ^Q143| ^Q148| ^Q149| ^Q16*| ^Q65*| ^Q66*| ^Q674|
^Q688| ^Q743| ^Q758| ^Q759| ^Q828"