Count number of new "infections" from time-series data

Question

In this study design, I have weekly measurements of cow milk that were cultured for specific pathogens over a period of N weeks.

I would like to count the number of new infections acquired each week - where new is defined to be the discovery of a pathogen not found in previous weeks for each cow - and summarize these findings based on pathogen type and week.

Suppose, I have data in the following format (e.g., long format):

CowId    PathogenType      Week
1234     No Growth         1
1234     S. aureus         2
1234     E. coli           2
1234     No Growth         3
5555     No Growth         1
5555     S. aureus         2
5555     S. aureus         3

In this example, the expected output would be:

PathogenType     Week     N       
No growth        1        2
S. aureus        2        2
E.coli           2        1

Is there a way to accomplish this task using dplyr? I am not able to think of a way to look back in time to ensure a pathogen was not seen in previous weeks for each cow.

score 0 · Answer 1 · answered Feb 21 '21 at 20:29

0

May be something like this

df %>%  group_by(PathogenType) %>% 
  filter(Week == min(Week)) %>% 
  summarise(Week=min(Week),
            N=n())

answered Feb 21 '21 at 20:29

Jaime Yáñez

98
3

Count number of new "infections" from time-series data

1 Answers1