I am trying to do a large data check for a database. Some fields in the database are hidden, so when I am doing the datacheck, I need to ignore all hidden fields. Fields are hidden based on conditional logic stored in the database. I have exported this conditional logic and have stored it in a dataframe in R. Now I need to automate the data check by somehow using the text string of a conditional argument to automate the script writing itself, which I do not think is possible, or finding a way around this problem.
Below is example code that I need to solve:
id <- c(1001, 1002, 1003, 1004, 1005, 1001, 1002, 1003, 1004, 1005)
target_var <- c("race","race","race","race","race", "race_other",
"race_other", "race_other", "race_other", "race_other")
value <- c(1, NA, 1, 1, 6, NA, NA, NA, NA, "Asian")
branching_logic <- c(NA, NA, NA, NA, NA,
"race == 6", "race == 6", "race == 6",
"race == 6", "race == 6")
race <- c(NA, NA, NA,NA, NA, 1, 1, 1, 6, 6)
data <- data.frame(id, var, value, branching_logic, race) %>%
mutate(data_check_result = case_when(
!is.na(value) ~ "No Missing Data",
is.na(value) & is.na(branching_logic) ~ "Missing Data 1",
is.na(value) & race == 6 ~ "Missing Data 2",
is.na(value) & race != 6 ~ "Hidden field",
))
It would be great if I could replace (race==6) with a variable or somehow directing the script to the conditional expression already saved as a string, but I know that R can't do that.
The above problem has four categories which the data could fall into:
- No Missing Data: only if value is non-na
- Missing Data 1: if the value is NA, and there is no branching logic that hid the variable.
- Missing Data 2: if the value is NA and the branching logic is met to show the field
- Hidden Field: if the value is NA and the branching logic is NOT net to show the field
I have thousands of fields to check with accompanying branching logic, so I need a way to use the branching logic saved in the "branching_logic" column within the script.
IMPORTANT NOTE: The case here is the simplest case. Many target_var variables and value variables have branching logic that looks at multiple other variables to determine whether to hide the field (Ex. race==6 & race==1)
This is only my second time posting, and I usually do not see such in depth problems here, but it would be great if someone has an idea!