Conditional count based on list of repeated IDs

Question

I cannot seem to get this to work or find the answer. I have a data frame like this:

PatientID <- c('1', "1", "1","1", "2","2","2","2","3")
hospital.time <- c(1,1,1,2,1,2,3,4,1)
fever <- c(1,1,NA,0,1,NA,1,1,NA)
ventilator<-c(1,0,1,1,0,1,0,1,NA)
df <- data.frame(PatientID, hospital.time, fever,ventilator)

Each patient have several measurements so the ID is repeated for each measurement. I would like to count how many patients in hour 1 have fever and on ventilator, how many patients have only fever, how many only on ventilator, the same for hour 2, hour 3 etc.

I have tried using boolean and dplyr based on PatientID, but no luck. Will I have to put this in a for loop to make it work?

Hope you can help.

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — Sotos, Feb 02 '18 at 14:54
Perhaps it would help to share your desired output from the sample data that you've shared. — A5C1D2H2I1M1N2O1R2T1, Feb 02 '18 at 16:04
Good idea. I would like the output to be for each PatientID I have one line with all the data like this: — User LG, Feb 02 '18 at 21:13
For each PatientID I would like one row. Then the columns would be like this: "ID", "Hour 1 Fever", "Hour 1 Ventilator", "Hour 1 ventilator & fever", "Hour 2 Fever", "Hour 2 Ventilator", Hour 2 ventilator & fever", "Hour 3...." etc. — User LG, Feb 02 '18 at 21:20

score 0 · Answer 1 · answered Feb 02 '18 at 14:59

Here's a way using dplyr:

library(dplyr)

pid <- c('1', "1", "1","1", "2","2","2","2","3")
hospital.time <- c(1,1,1,2,1,2,3,4,1)
fever <- c(1,1,NA,0,1,NA,1,1,NA)
ventilator<-c(1,0,1,1,0,1,0,1,NA)
df <- data.frame(pid, hospital.time, fever,ventilator)

dfg<-df %>% mutate(fv=ifelse(fever==1 & ventilator==1, 1, 0)) %>% group_by(pid) %>% summarise(f=sum(fever,na.rm=TRUE), v=sum(ventilator, na.rm=TRUE), fv=sum(fv, na.rm=TRUE))
dfg

Output:

     pid     f     v    fv
  (fctr) (dbl) (dbl) (dbl)
1      1     2     3     1
2      2     3     2     1
3      3     0     0     0

I don't think that line breaks cost anything extra, but they give the reader good returns..... — A5C1D2H2I1M1N2O1R2T1, Feb 02 '18 at 16:00

score 0 · Answer 2 · answered Feb 02 '18 at 15:03

0

Another way with dplyr:

df %>%
  group_by(PatientID, hospital.time) %>%
  summarise(f = ifelse(sum(fever, na.rm =T) > 0, 1, 0), 
            v = ifelse(sum(ventilator, na.rm = T) > 0, 1, 0), 
            fandV = ifelse(sum(c(ventilator, fever), na.rm = T) > 1, 1, 0))

This groups by PatientID and hospital.time and returns a binary value for each ID and hour whether or not they had a fever, a ventilator, or both.

answered Feb 02 '18 at 15:03

Steven

3,238
21
50

Thanks Steven, I think that is part of the way, but if I would like to know how many patients had a fever in hour 1, fever and ventilator in hour 1, ventilator in hour 1, the same for hour 2 etc. Then I think I need to change it to have one row with patient ID and then columns for hour 1 fever, hour 1 ventilator, hour 1 f&v, hour 2 fever, etc. etc. That is the only way I can see that I could then summarize for each column? – User LG Feb 02 '18 at 22:15

Conditional count based on list of repeated IDs

2 Answers2