-1

I have hourly precipitation data and I would like to identify storms based on a user-specified gap tolerance of no rain. Below is an excerpt of my data.

I have followed the instructions from this post Calculating precipitation intensity in R using the rle function, but the critical difference is I want to be able to specify a tolerance for consecutive zeros (e.g. less than 6 hours) where they are included in the same storm ID. The post above would result in 3 different storms (i.e. 3 run-lengths of precip > 0 in the data below), but I would like it to be only 1 storm as none of the run-lengths between the precip periods is greater than 6 hours (said another way: A gap of more than 6 hours with no rain would start a new storm).

Thanks!

structure(list(time_local = c("2021-05-04 21:00:00", "2021-05-04 22:00:00", 
"2021-05-04 23:00:00", "2021-05-05 00:00:00", "2021-05-05 01:00:00", 
"2021-05-05 02:00:00", "2021-05-05 03:00:00", "2021-05-05 04:00:00", 
"2021-05-05 05:00:00", "2021-05-05 06:00:00", "2021-05-05 07:00:00", 
"2021-05-05 08:00:00", "2021-05-05 09:00:00", "2021-05-05 10:00:00", 
"2021-05-05 11:00:00", "2021-05-05 12:00:00", "2021-05-05 13:00:00", 
"2021-05-05 14:00:00", "2021-05-05 15:00:00", "2021-05-05 16:00:00", 
"2021-05-05 17:00:00", "2021-05-05 18:00:00", "2021-05-05 19:00:00", 
"2021-05-05 20:00:00", "2021-05-05 21:00:00", "2021-05-05 22:00:00", 
"2021-05-05 23:00:00", "2021-05-06 00:00:00", "2021-05-06 01:00:00", 
"2021-05-06 02:00:00", "2021-05-06 03:00:00", "2021-05-06 04:00:00", 
"2021-05-06 05:00:00", "2021-05-06 06:00:00", "2021-05-06 07:00:00"
), prcp = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0.3, 0, 0, 0, 0, 0.3, 
0.5, 0.5, 1.3, 0.5, 1.8, 0.3, 0, 0, 1.8, 0.8, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0)), row.names = 46:80, class = "data.frame").

DarwinsBeard
  • 527
  • 4
  • 16
  • 2
    It is [off-topic](https://stackoverflow.com/help/on-topic) on StackOverflow to recommend *some method (or package if anyone knows of one!)* or *a tidyverse solution*. This can lead to an opinionated, non-objective answers even spam. Ideally, you [research](https://meta.stackoverflow.com/q/261592/1422451), find a method and make an earnest attempt to which we will help troubleshoot for you. – Parfait May 21 '21 at 21:07

2 Answers2

3

Here's a dplyr approach. I define a new group as one where only one of the current 6hr total and the prior 6hr total are zero.

library(dplyr)
tol = 6
rain %>%
  mutate(amt_6hr = cumsum(prcp) - cumsum(lag(prcp, tol, default = 0)),
         new_period = amt_6hr != lag(amt_6hr, default = 0) & pmin(amt_6hr, lag(amt_6hr)) == 0,
         group = cumsum(new_period))



            time_local prcp amt_6hr new_period group
46 2021-05-04 21:00:00  0.0     0.0      FALSE     0
47 2021-05-04 22:00:00  0.0     0.0      FALSE     0
48 2021-05-04 23:00:00  0.0     0.0      FALSE     0
49 2021-05-05 00:00:00  0.0     0.0      FALSE     0
50 2021-05-05 01:00:00  0.0     0.0      FALSE     0
51 2021-05-05 02:00:00  0.0     0.0      FALSE     0
52 2021-05-05 03:00:00  0.0     0.0      FALSE     0
53 2021-05-05 04:00:00  0.0     0.0      FALSE     0
54 2021-05-05 05:00:00  0.0     0.0      FALSE     0
55 2021-05-05 06:00:00  0.3     0.3       TRUE     1
56 2021-05-05 07:00:00  0.0     0.3      FALSE     1
57 2021-05-05 08:00:00  0.0     0.3      FALSE     1
58 2021-05-05 09:00:00  0.0     0.3      FALSE     1
59 2021-05-05 10:00:00  0.0     0.3      FALSE     1
60 2021-05-05 11:00:00  0.3     0.6      FALSE     1
61 2021-05-05 12:00:00  0.5     0.8      FALSE     1
62 2021-05-05 13:00:00  0.5     1.3      FALSE     1
63 2021-05-05 14:00:00  1.3     2.6      FALSE     1
64 2021-05-05 15:00:00  0.5     3.1      FALSE     1
65 2021-05-05 16:00:00  1.8     4.9      FALSE     1
66 2021-05-05 17:00:00  0.3     4.9      FALSE     1
67 2021-05-05 18:00:00  0.0     4.4      FALSE     1
68 2021-05-05 19:00:00  0.0     3.9      FALSE     1
69 2021-05-05 20:00:00  1.8     4.4      FALSE     1
70 2021-05-05 21:00:00  0.8     4.7      FALSE     1
71 2021-05-05 22:00:00  0.0     2.9      FALSE     1
72 2021-05-05 23:00:00  0.0     2.6      FALSE     1
73 2021-05-06 00:00:00  0.0     2.6      FALSE     1
74 2021-05-06 01:00:00  0.0     2.6      FALSE     1
75 2021-05-06 02:00:00  0.0     0.8      FALSE     1
76 2021-05-06 03:00:00  0.0     0.0       TRUE     2
77 2021-05-06 04:00:00  0.0     0.0      FALSE     2
78 2021-05-06 05:00:00  0.0     0.0      FALSE     2
79 2021-05-06 06:00:00  0.0     0.0      FALSE     2
80 2021-05-06 07:00:00  0.0     0.0      FALSE     2
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
2

One option is to use rle and count the number of breaks in between storms.

library(dplyr)

df %>%
  pull(prcp) %>%
  rle %>%
  unclass %>%
  data.frame %>%
  mutate(storm_break = +(values == 0 & lengths > 6),
         storm_id = cumsum(storm_break))


#-----------------------
   lengths values storm_break storm_id
1        9    0.0           1        1
2        1    0.3           0        1
3        4    0.0           0        1
4        1    0.3           0        1
5        2    0.5           0        1
6        1    1.3           0        1
7        1    0.5           0        1
8        1    1.8           0        1
9        1    0.3           0        1
10       2    0.0           0        1
11       1    1.8           0        1
12       1    0.8           0        1
13      10    0.0           1        2
nniloc
  • 4,128
  • 2
  • 11
  • 22