0

I'm hoping to calculate count/sum of each subgroup within groups over a time series.

My question is very similar to this question Rolling Count of Events Over Time Series.

Apologies for cross-posting, I have been looking for ways to count events for each category in group 1 within a time range (Present date and the previous N (say 4) days). I want to repeat this process for every subtype in group 2, i.e. Group 2 is a larger group that may/may not contains all the categories within Group 1.

For example, if we have a data frame that looks like the following

dates = as.Date(c("2011-10-09",
    "2011-10-15",
    "2011-10-16", 
    "2011-10-18", 
    "2011-10-21", 
    "2011-10-22", 
    "2011-10-24")) 
group1=c("A",
     "A",
     "A", 
     "A", 
     "L", 
     "L", 
     "A")
group2=c("I",
     "I",
     "I", 
     "I", 
     "I", 
     "I", 
     "II")

df1 <- data.frame(dates, group1, group2) 

And I'm looking for output similar to this. (Edited) Eventually, I want to spread my dataset so that I will have categories in Group 1 in separate columns, and arrange rows according to dates and Group 2. How can I make sure the count of Group 1 categories is carried forward to the new row (and satisfy the timeframe stated above)?

            dates  group1 group2  count (A)   count (L)
     1 2011-10-09      A      I        1         0
     2 2011-10-15      A      I        1         0
     3 2011-10-16      A      I        2         0
     4 2011-10-18      A      I        3         0
     5 2011-10-21      L      I        0         1
     6 2011-10-22      L      I        0         2
     7 2011-10-24      A      II       1         0

Thanks!

JJJ
  • 1,009
  • 6
  • 19
  • 31
byc
  • 121
  • 10

1 Answers1

0

If you are trying to show the count of all occurrences of events from group1 and group2 in the 4 days leading up to an event, you can just group_by the two event groups and then count (using sapply to access each date separately).

df1 %>%
  group_by(group1, group2) %>%
  mutate(count = sapply(dates
                        , function(x){
                          sum(dates <= x & dates > (x-4))
                          }))

returns:

       dates group1 group2 count
      <date> <fctr> <fctr> <int>
1 2011-10-09      A      I     1
2 2011-10-15      A      I     1
3 2011-10-16      A      I     2
4 2011-10-18      A      I     3
5 2011-10-21      L      I     1
6 2011-10-22      L      I     2
7 2011-10-24      A     II     1
Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thanks Mark, I used group_by like you suggested here and in the previous post too. Eventually I want to spread my dataset so that I will have categories in Group 1 in separate columns, and arrange rows according to dates and Group 2. How can I make sure the count of Group 1 categories is carried forward to the new row (and satisfy the timeframe stated above)? Thanks – byc Dec 12 '17 at 18:10
  • The spreading seems like a fundamentally different question. You probably need to generate a sample dataset that shows what you want and ask a new question about that. What I have here already answered the question that you asked. (If you generate a new question, feel free to drop the link in a comment and I will take a look if I have time. – Mark Peterson Dec 12 '17 at 19:23
  • Thanks for the tips Mark. I've created another question here https://stackoverflow.com/questions/47781899/fill-count-sum-based-on-previous-row-count-over-time-series – byc Dec 12 '17 at 21:33