0

I am working on daily rainfall data and trying to evaluate the extreme events from the time series data above a certain threshold value in each month per year i.e. the number of times the rainfall exceeded a certain threshold in each month per year.

The rainfall timeseries data is from St Lucia and has two columns:

  1. "YEARMODA" - defining the time (format- YYYYMMDD)

  2. "PREP" - rainfall in mm (numeric)

    StLucia <- read_excel("C:/Users/hp/Desktop/StLuciaProject.xlsx")

The dataframe which I'm working i.e "Precip1" on has two columns namely:

  1. Time (format YYYY-MM-DD)

  2. Precipitation (numeric value)

The code is provided below:

library("imputeTS")
StLucia$YEARMODA <- as.Date(as.character(StLucia$YEARMODA), format = "%Y%m%d")

data1 <- na_ma(StLucia$PREP, k=4, weighting = "exponential")

Precip1 <- data.frame(Time= StLucia$YEARMODA, Precipitation= data1, check.rows = TRUE)

I found out the threshold value based on the 95th percentile and 99th percentile using function quantile().

I now want to count the number of "extreme events" of rainfall above this threshold in each month on per year basis.

Please help me out on this. I would be highly obliged by your help. Thank You!

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
  • 3
    It would be helpful i you could provide a subset of your data (e.g. using `dput`) to make this problem reproducible -- much easier for people to help when they can run it on their own machines. Thanks :) – mysteRious Jun 05 '18 at 00:15
  • @mysteRious, I'm not able to send the data from there. Can you please elaborate the process to me for doing that. Thanks! – Karan Chaudhary Jun 05 '18 at 15:17
  • See this question and its answers: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Calum You Jun 05 '18 at 18:30

1 Answers1

1

If you are open to a tidyverse method, here is an example with the economics dataset that is built into ggplot2. We can use ntile to assign a percentile group to each observation. Then we group_by the year, and get a count of the values that are in the desired percentiles. Because this is monthly data the counts are pretty low, but it's easily translated to daily data.

library(tidyverse)
thresholds <- economics %>%
  mutate(
    pctile = ntile(unemploy, 100),
    year = year(date)
    ) %>%
  group_by(year) %>%
  summarise(
    q95 = sum(pctile >= 95L),
    q99 = sum(pctile >= 99L)
    )
arrange(thresholds, desc(q95))
#> # A tibble: 49 x 3
#>     year   q95   q99
#>    <dbl> <int> <int>
#>  1  2010    12     6
#>  2  2011    12     0
#>  3  2009    10     5
#>  4  1967     0     0
#>  5  1968     0     0
#>  6  1969     0     0
#>  7  1970     0     0
#>  8  1971     0     0
#>  9  1972     0     0
#> 10  1973     0     0
#> # ... with 39 more rows

Created on 2018-06-04 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • can you explain me this code ' pctile = ntile(unemploy, 100)' which you have mentioned above. Thanks – Karan Chaudhary Jun 05 '18 at 15:13
  • Also, is it possible to set the threshold in the code **manually** without using the percentiles? – Karan Chaudhary Jun 05 '18 at 15:16
  • `ntile` takes arguments `x`, a vector, and `n`, a number of buckets. It returns a vector of same length as `x` with values corresponding to the bucket that the values of `x` fit in. So with `n = 100`, the smallest 1% of values get `pctile = 1`, the 95th % gets `pctile = 95`. So we just count the number above the 95th percentile. – Calum You Jun 05 '18 at 18:33
  • Yes you can set a manual threshold, just use `sum(unemploy > threshold)` inside summarise – Calum You Jun 05 '18 at 18:34
  • Also mam, is it possible to know the number of peaks above the defined threshold at Six Months intervals i.e. the number of peaks that are above the threshold in the first six months are independent from the peaks present in next six months of a year. I want to do this since I want to characterise the peaks above the threshold in dry and wet season respectively in a year. – Karan Chaudhary Jun 06 '18 at 20:47