0

I have a data set with multiple diagnosis columns (ie. DIAG1, DIAG2, DIAG3, etc.). I am looking to create a loop that will check each column for all of my rows, but I'm looking for more than one diagnosis code within each of those columns.

For example, I want to find code xxx1 and xxx3 if present in DIAG1, DIAG2, DIAG3, etc.

My code is below where
1. df = my dataframe
2. df$illness = is the variable I want to create
3. xxxx1 = the code I'm looking for
4. [26:34, 57:72] = the columns where DIAG1, etc. exist

**EDIT: Example data:

DIAG3  DIAG4  DIAG5  DIAG6
1231   xxx1   5468   5468
1454   2352   4542   4864
xxx2   1235   1234   3564
1234   1589   xxx1   8498

Code I tried to perform:

for (i in 1:nrow(df)) {
df$illness[i] <- ("xxx1" %in% df[i,26:34, 57:72] | "xxx3" %in% 
df[i,26:34, 57:72]}

What I would like my loop to perform:

DIAG3  DIAG4  DIAG5  DIAG6  Illness
1231   xxx1   5468   5468   TRUE
1454   2352   4542   4864   FALSE
xxx3   1235   1234   3564   TRUE
1234   1589   xxx1   8498   TRUE

What happens is that the code runs but never ends. I don't know where my mistake is. Thank you

Nhan
  • 1
  • 2
  • 2
    It would be easier to help if you provided a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and the desired output. – MrFlick Jul 07 '17 at 21:35
  • MrFlick, I added some more information. I hope that helps show what I am trying to perform – Nhan Jul 07 '17 at 22:31

3 Answers3

1

It looks like the sub-setting of df is wrong. It should probably be df[i,c(26:34,57:72)]. Is the df$illness[i] supposed to be a list?

Kevin
  • 319
  • 2
  • 10
0

We can do this by looping through the columns, check using grepl to create a logical vector, then Reduce it to a single vector with |

df1$Illness <- Reduce(`|`, lapply(df1, grepl, pattern = "xxx"))
df1$Illness
#[1]  TRUE FALSE  TRUE  TRUE
akrun
  • 874,273
  • 37
  • 540
  • 662
0

I'm assuming that xxxx1 is actually supposed to be a numeric value, then you could simply use an ifelse statement such as:

dat <- data.frame(DIAG3 = c(1231,1454,2222,1234),
                    DIAG4 = c(1111,2352,1235,1589),
                    DIAG5 = c(5468,4542,1234,1111),
                    DIAG6 = c(5468,4864,3564,8498))

library(dplyr)
dat %>% 
    rowwise() %>% 
    mutate(Illness = ifelse(DIAG3==1111 | DIAG4==1111 | DIAG5==1111 | DIAG6==1111|
                                    DIAG3==2222 | DIAG4==2222 | DIAG5==2222 | DIAG6==2222, "TRUE", "FALSE"))
B Williams
  • 1,992
  • 12
  • 19