I have a data set which has patient diagnostic (ICD-9) codes, which can have a length between 3-5 digits, where the first three digits represent a classification of diagnosis, and the 4th and 5th represent a further refinement of the classification. For example:
zz<-" dx1 dx2 dx3
1 64251 82381 8100
2 8052 8730 51881
3 64421 431 81601
4 3041 29690 9920
5 72888 8782 59080
6 7245 60886 8479
7 291 4659 4739
8 30410 30400 95901
9 2929 30500 8208
10 7840 6268 8052"
df<-read.table(text=zz, header=TRUE)
Each row of codes represents multiple diagnoses of the same individual. I have written a series of ifelse statements to create a new variable with the codes I’m interested in so they are mapped to numbers representing different diagnoses of interest:
df$x<-ifelse(grepl("^291", dx1),1, ifelse(grepl("^292", dx1),1
ifelse(grepl("^3040", dx1),2,ifelse(grepl("^3047", dx1),2,
ifelse(grepl("^3051", dx1),3,ifelse(grepl("^98984", dx1),3,0))))))
Where I run into trouble is when I want to check for these select codes across each of the columns containing diagnostic codes. I attempted to write a function for this:
df$alldx<-apply(df[,c(1:3)],MARGIN = 2, function(dx){
ifelse(grepl("^291", dx),1, ifelse(grepl("^292", dx),1
ifelse(grepl("^3040", dx),2,ifelse(grepl("^3047", dx),2,
ifelse(grepl("^3051", dx),3,ifelse(grepl("^98984", dx),3,0))))))
})
The problem is I only want to count an individual once if they have one of the codes of interest; in the case of multiple code matches, then that person’s code should be whichever diagnosis was given first. I feel like there must be a way to do this, but it’s well beyond my coding abilities!