0

In this study design, I have weekly measurements of cow milk that were cultured for specific pathogens over a period of N weeks.

I would like to count the number of new infections acquired each week - where new is defined to be the discovery of a pathogen not found in previous weeks for each cow - and summarize these findings based on pathogen type and week.

Suppose, I have data in the following format (e.g., long format):

CowId    PathogenType      Week
1234     No Growth         1
1234     S. aureus         2
1234     E. coli           2
1234     No Growth         3
5555     No Growth         1
5555     S. aureus         2
5555     S. aureus         3

In this example, the expected output would be:

PathogenType     Week     N       
No growth        1        2
S. aureus        2        2
E.coli           2        1

Is there a way to accomplish this task using dplyr? I am not able to think of a way to look back in time to ensure a pathogen was not seen in previous weeks for each cow.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

1 Answers1

0

May be something like this

df %>%  group_by(PathogenType) %>% 
  filter(Week == min(Week)) %>% 
  summarise(Week=min(Week),
            N=n())