0

I have a flood table with several stations, dates and discharge and I'm trying to identify a date for each flood. The flood can last several days and so I would like to create an ID to number the floods, I would have the same number when the date is consecutive with the previous row and if it's not consecutive, the ID would be the previous one + 1. The column Diffdays counts the days between the current row and the previous one.

My data looks like this:

Station      Date            Diffdays   Discharge  Desired counter
Y6042010     1926-11-19      NA         89         1
Y6042010     1928-10-22      703        100        2
Y6042010     1928-10-23      1          115        2
W2022030     2000-04-03      NA         12         3
W2022030     2000-04-04      1          16         3
W2022030     2001-11-13      588        14         4

Any idea or suggestion? I tried several things but I can't find something that works. The code below would be the closest one I think but I get an error:

Flood <- Flood %>%
  group_by(Station) %>%
  mutate(Flood_counter = ifelse(Diffdays != 1, (Flood_counter - 1) + 1, Flood_counter - 1))
Jude
  • 153
  • 2
  • 8
  • 1
    You may use `cumsum(...diff(...` idiom (`cumsum(c(TRUE, diff(Date) != 1L))`), as described here: [Create grouping variable for consecutive sequences and split vector](https://stackoverflow.com/questions/5222061/create-grouping-variable-for-consecutive-sequences-and-split-vector). Make sure that your date is of class `Date`. You may also use the convenience function `collapse::seqid`. – Henrik Mar 29 '21 at 19:20
  • Please provide exemplary data as the output from ```dput()```. – rjen Mar 29 '21 at 19:20

1 Answers1

1

Construct a logical vector using the condition that the current date is not one day after the prior date (or has no prior date). Then do a cumulative sum on the vector.

tibble(Date = as.Date(c("1926-11-19", "1928-10-22", "1928-10-23", 
                "2000-04-03", "2000-04-04", "2001-11-13"))) %>%
    mutate(
        first_one = seq_along(Date) == 1,
        sequential = Date == dplyr::lag(Date) + 1,
        id = cumsum(first_one | !sequential)
    )

Result

# A tibble: 6 x 3
  Date       sequential    id
  <date>     <lgl>      <dbl>
1 1926-11-19 TRUE           1
2 1928-10-22 FALSE          2
3 1928-10-23 TRUE           2
4 2000-04-03 FALSE          3
5 2000-04-04 TRUE           3
6 2001-11-13 FALSE          4

Edit: Including a group_by will cause the subsequent mutate to treat each subset of rows as if they were separate data sets, starting the count over again for each group. Refer to R for Data Science for more details (https://r4ds.had.co.nz/transform.html#grouped-mutates-and-filters)

tibble(Date = as.Date(c("1926-11-19", "1928-10-22", "1928-10-23", 
                    "2000-04-03", "2000-04-04", "2001-11-13")),
   station = c("A", "A", "A", "B", "B", "B")) %>%
group_by(station) %>%
mutate(
    first_one = seq_along(Date) == 1,
    sequential = Date == dplyr::lag(Date) + 1,
    flood_counter = cumsum(first_one | !sequential)
) 

Result

# A tibble: 6 x 5
# Groups:   station [2]
  Date       station first_one sequential flood_counter
  <date>     <chr>   <lgl>     <lgl>              <int>
1 1926-11-19 A       TRUE      NA                     1
2 1928-10-22 A       FALSE     FALSE                  2
3 1928-10-23 A       FALSE     TRUE                   2
4 2000-04-03 B       TRUE      NA                     1
5 2000-04-04 B       FALSE     TRUE                   1
6 2001-11-13 B       FALSE     FALSE                  2
Damian
  • 1,385
  • 10
  • 10
  • Thanks for your help, that works perfectly! Just a question to understand fully this code: how does it know to start again from 0 when there is a new station in the data frame? This works on my data frame where I have several stations and when a new station appears I get 1 again for the id (without groupin before). – Jude Mar 30 '21 at 14:17
  • I updated the post, the short answer is that in effect the `group_by()` tells mutate where to start over – Damian Mar 30 '21 at 16:07