I have a dataset where there is a record of the rainfalls since 2003. Another dataset contains the information of sampling dates since 2003 until now. I want to sum the amount of rain between the sampling dates (see the object called date.per.year
).
I found this but I want to use a vector of values (c1 =sum(rain in interval [X, Y[, c2 =sum(rain in interval [Y, Z[, c3 =sum(rain in interval [Z, A[, etc.)
date.per.year = structure(c(12110, 12460, 12815, 13196, 13564.5, 13930, 14321,
14652, 15028, 15408, 15792, 16106), .Names = c("2003", "2004",
"2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012",
"2013", "2014"))
Imagine that the Date and rain data frame is this:
df = data.frame(Dates = seq(as.Date("2003/1/1"),
as.Date("2015/1/1"), "days"),
rain = rnorm(length(seq(as.Date("2003/1/1"), as.Date("2015/1/1"), "days"))))
I also tried this, but it's not creating bins that are usable:
## create corresponding intervals
splits <- cut(date.per.year, median, breaks=date.per.year)
Warning message:
In split.default(df$rain, f = splits) :
data length is not a multiple of split variable
## split df$rain into intervals and sum them
lapply(split(df$rain, f=splits), sum)
Or even this:
library(data.table)
DT <- data.table(df)
setkey(DT, rain, Dates)
DT[, sumSum := DT[ .(.BY[[1]], .d+(-5:-1) )][, sum(sum, na.rm=TRUE)] , by=list(date.per.year, .d=Dates)]
Error in `[.data.table`(DT, , `:=`(sumSum, DT[.(.BY[[1]], .d + (-5:-1))][, : The items in the 'by' or 'keyby' list are length (12,4384). Each must be same length as rows in x or number of rows returned by i (4384).
DT
An illustration of what I want to do is below. Imagine that the red lines are the dates that are creating the ranges I want to sum (which is the date.per.year
object). In the end, I should have 11 values of the sum of the different ranges. Is it possible to do this?