I have a series of daily values, y. For each day, di (i.e., each row), I would like to calculate the (graph) area, ai, of the region between the curve and the horizontal line y = yi between di and the most recent previous occurrence of the value yi. Sketch below. Because observations occur at regular, discrete timesteps (daily), the calculated area, ai, is equivalent to the sum of the daily differences between each daily y and yi (black bars in figure). I'm interested only in valleys, so the calculated area, ai, can be set to 0 when y is decreasing (yi - yi-1 <= 0).
Toy data below. Expected result shown in dat$a
.
dat$a[6]
was calculated from 55 - 50;
dat$a[7]
was calculated from (60-55)+(60-50). And so on.
dat = data.frame(d = seq.Date(as_date("2021-01-01"),as_date("2021-01-10"),by = "1 day"),
y = c(100,95,90,70,50,55,60,75,85,90),
a = c(0,0,0,0,0,5,15,65,115,145))
My first thought was to calculate the area between the curve and the horizontal line y = yi between days di and the the most recent previous occurrence of the value yi, using perhaps geiger::area.between.curves()
, but I couldn't work out how to identify most recent previous occurrence of the value yi.
[In case the context helps, the actual data are daily values of the area (m2) of a wetland not submerged by water. When the water rises, a portion of the wetland that had been dry for some time becomes wet. Here, I'm trying to calculate the extent of the reflooding in m2-days. A portion of the wetland that has been dry for a long time but becomes reflooded will contribute many m2-days to the sum.]
I'm most comfortable in the tidyverse, and such answers are greatly preferred. I am not familiar with data.table.
Thanks in advance
Update
I was able to able to achieve my desired calculation in Excel, though it's brutally inelegant. Couple hundred rows in an example, linked below. Given that my real data are 180k rows, my poor machine hated the 18 million calculated cells. Though I can move on with my analysis, I am still very interested in an R solution. My implemented approach differs subtly from my imagined R approach in that it's summing 'horizontal rectangles', so to speak, each of the same (small) y-unit height, rather than 'vertical rectangles', each of unit width.
Here's the file.