0

3 doctors diagnose a patient

question 1 : how to filter the patient which all 3 doctors diagnose with disease B (no matter B.1, B.2 or B.3)

question 2: how to filter the patient which any of 3 doctors diagnose with disease A.

set.seed(20200107)
df <- data.frame(id = rep(1:5,each =3),
                 disease = sample(c('A','B'), 15, replace = T))
df$disease <-  as.character(df$disease)
df[1,2] <- 'A'
df[4,2] <- 'B.1'
df[5,2] <- 'B.2'
df[6,2] <- 'B.3'ยท
df

I got a method but I don't know how to write the code. I think in the code any() or all() function shoule be used.

First, I want to group patients by id.

Second, check if all the disease is A or B in each group.

The code like this

df %>% group_by(id) %>% filter_all(all_vars(disease == B))
zhiwei li
  • 1,635
  • 8
  • 26

2 Answers2

1

You can use all assuming every patient is checked by 3 doctors only.

library(dplyr)
df %>% group_by(id) %>% summarise(disease_B = all(grepl('B', disease)))

#     id disease_B
#  <int> <lgl>    
#1     1 FALSE    
#2     2 TRUE     
#3     3 FALSE    
#4     4 FALSE    
#5     5 FALSE    

If you want to subset the rows of the patient, we can use filter

df %>% group_by(id) %>% filter(all(grepl('B', disease)))

For question 2: similarly, we can use any

df %>% group_by(id) %>% summarise(disease_B = any(grepl('A', disease)))

data

df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
4L, 4L, 5L, 5L, 5L), disease = c("A", "A", "A", "B.1", "B.2", 
"B.3", "B", "A", "A", "B", "A", "A", "B", "A", "B")), row.names = c(NA, 
-15L), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    I think your output of the first command is wrong, the command works but you should get 2 and 4 TRUE. โ€“ dc37 Jan 07 '20 at 08:29
  • @dc37 For 4 I have values "B", "A", "A". Do you have different values ? โ€“ Ronak Shah Jan 07 '20 at 08:33
  • Yes, I have only B on id 4, but I guess it is because of the use of `sample` in the definition of the dataframe. My bad. โ€“ dc37 Jan 07 '20 at 14:18
0

For the question 1, you can replace B.1 B.2 ... by B, then count the number of different "Disease" per patients and filter to keep only those equal to 3 and B:

library(tidyverse)
df %>% group_by(id) %>% 
  mutate(Disease = gsub(".[0-9]+","",disease)) %>% 
  count(Disease) %>% 
  filter(n == 3 & Disease == "B")

# A tibble: 2 x 3
# Groups:   id [2]
     id Disease     n
  <int> <chr>   <int>
1     2 B           3
2     4 B           3

For the question 2, similarly, you can replace B.1 ... by B, then filter all rows with Disease is A, then count the number of rows per patients and your output is the patient id and the number of doctors that diagnose the disease A:

df %>% group_by(id) %>% 
  mutate(Disease = gsub(".[0-9]+","",disease))%>% 
  filter(Disease == "A") %>%  
  count(id)


# A tibble: 3 x 2
# Groups:   id [3]
     id     n
  <int> <int>
1     1     1
2     3     3
3     5     2
dc37
  • 15,840
  • 4
  • 15
  • 32