0

I'd like to get a summary of time series data where group is "Flare" and the max value of the FlareLength is the data of interest for that group.

If I have a dataframe, like this:


   Date           Flare       FlareLength
1  2015-12-01     0           1
2  2015-12-02     0           2
3  2015-12-03     0           3
4  2015-12-04     0           4
5  2015-12-05     0           5
6  2015-12-06     0           6
7  2015-12-07     1           1
8  2015-12-08     1           2
9  2015-12-09     1           3
10 2015-12-10     1           4
11 2015-12-11     0           1
12 2015-12-12     0           2
13 2015-12-13     0           3
14 2015-12-14     0           4
15 2015-12-15     0           5
16 2015-12-16     0           6
17 2015-12-17     0           7
18 2015-12-18     0           8
19 2015-12-19     0           9
20 2015-12-20     0          10
21 2015-12-21     0          11
22 2016-01-11     1           1
23 2016-01-12     1           2
24 2016-01-13     1           3
25 2016-01-14     1           4
26 2016-01-15     1           5
27 2016-01-16     1           6
28 2016-01-17     1           7
29 2016-01-18     1           8

I'd like output like:

  Date           Flare       FlareLength
1 2015-12-06     0           6
2 2015-12-10     1           4
3 2015-12-21     0          11
4 2016-01-18     1           8

I have tried various aggregate forms but I'm not very familiar with the time series wrinkle.

user1895891
  • 125
  • 1
  • 5
  • Hey there, could you provide some r code you have tried to use so far, so we can figure out what part you're stumped on? – CausingUnderflowsEverywhere Jan 29 '20 at 06:21
  • I'm sorry. Good point. I'm fairly new to this. Thanks for your patience. I ended up posting a more complete question with code here: https://stackoverflow.com/questions/59978973/use-dplyr-to-summarize-but-preserve-date-of-group-row – user1895891 Jan 30 '20 at 04:15

1 Answers1

0

Using dplyr, we can create a grouping variable by comparing the FlareLength with the previous FlareLength value and select the row with maximum FlareLength in the group.

library(dplyr)

df %>%
  group_by(gr = cumsum(FlareLength < lag(FlareLength, 
                       default = first(FlareLength)))) %>%
  slice(which.max(FlareLength)) %>%
  ungroup() %>%
  select(-gr)

# A tibble: 4 x 3
#  Date       Flare FlareLength
#  <fct>      <int>       <int>
#1 2015-12-06     0           6
#2 2015-12-10     1           4
#3 2015-12-21     0          11
#4 2016-01-18     1           8

In base R with ave we can do the same as

subset(df, FlareLength == ave(FlareLength, cumsum(c(TRUE, diff(FlareLength) < 0)), 
           FUN = max))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213