I have a large dataframe with multiple columns (about 150).
There is a range of columns (Dx1, Dx2..until Dx30) which are diagnosis codes (the codes are numbers, but they are categorical variables that correspond to a medical diagnosis using the ICD-9 coding system).
I have working code to search a single column, but need to search all 30 columns to see if any of the columns contain a code within the specified range (DXrange).
The core dataframe looks like:
Case DX1 DX2 DX3 DX4...DX30
1 123 345 567 99 12
2 234 345 NA NA NA
3 456 567 789 345 34
Here is the working code:
## Defines a range of codes to search for
DXrange <- factor(41000:41091, levels = levels(core$DX1))
## Search for the DXrange codes in column DX1.
core$IndexEvent <- core$DX1 %in% DXrange & substr(core$DX1, 5, 5) != 2
## What is the frequency of the IndexEvent?
cat("Frequency of IndexEvent : \n"); table(core$IndexEvent)
The working code is adapted from "Calculating Nationwide Readmissions Database (NRD) Variances, Report # 2017-01"
I could run this for each DX column and then sum them for a final IndexEvent total, but this is not very efficient.