0

I have data measuring precipitation daily using R. My dates are in format 2008-01-01 and range for 10 years. I am trying to aggregate from 2008-10-01 to 2009-09-31 but I am not sure how. Is there a way in aggregate to set a start date of aggregation and group.

My current code is

data<- aggregate(data$total_snow_cm, by=list(data$year), FUN = 'sum')

but this output gives me a sum total of the snowfall for each year from jan - dec but I want it to include oct / 08 to sept / 09.

RJAM
  • 3
  • 3
  • You need to share a small sample of your data and it's expected output before anyone can provide any useful help. See [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Shree Jul 14 '19 at 00:20

2 Answers2

0

Assuming your data are in long format, I'd do something like this:

 library(tidyverse)

 #make sure R knows your dates are dates - you mention they're 'yyyy-mm-dd', so
 yourdataframe <- yourdataframe %>% 
                  mutate(yourcolumnforprecipdate = ymd(yourcolumnforprecipdate) 


 #in this script or another, define a water year function
 water_year <- function(date) {
               ifelse(month(date) < 10, year(date), year(date)+1)}

 #new wateryear column for your data, using your new function
 yourdataframe <- yourdataframe %>% 
                  mutate(wateryear = water_year(yourcolumnforprecipdate)

 #now group by water year (and location if there's more than one) 
 #and sum and create new data.frame

 wy_sums <- yourdataframe %>% group_by(locationcolumn, wateryear) %>% 
            summarize(wy_totalprecip = sum(dailyprecip))

For more info, read up on the tidyverse 's great sublibrary called lubridate - where the ymd() function is from. There are others like ymd_hms(). mutate() is from the tidyverse's dplyr libary. Both libraries are extremely useful!

dbo
  • 1,174
  • 1
  • 11
  • 19
  • 1
    Great! Welcome to SO! Just FYI, as @Shree mentioned, you'll by far get the best answers if you include some sample data (like a small piece of your timeseries) with your question. Say for this one, including ten or so days with precip from say `2008-09-25` to `2008-10-05`, would be enough for others to quickly cut and paste your data and come up with solutions. – dbo Jul 14 '19 at 01:08
0

I'd like to give the actual answer to the question, where the aggregate() way was asked.

You may use with() to wrap the data specification around aggregate(). In the with() you can define date intervals as you can with numbers.

df1.agg <- with(df1[as.Date("2008-10-01") <= df1$year & df1$year <= as.Date("2009-09-30"), ], 
                aggregate(total_snow_cm, by=list(year), FUN=sum))

Another way is to use aggregate()'s formula interface, where data and, hence, also the interval can be specified inside the aggregate() call.

df1.agg <- aggregate(total_snow_cm ~ year, 
                     data=df1[as.Date("2008-10-01") <= df1$year & 
                                df1$year <= as.Date("2009-09-30"), ], FUN=sum)

Result

head(df1.agg)
#         year total_snow_cm
# 1 2008-10-01           171
# 2 2008-10-02           226
# 3 2008-10-03           182
# 4 2008-10-04           129
# 5 2008-10-05           135
# 6 2008-10-06           222

Data

set.seed(42)
df1 <- data.frame(total_snow_cm=sample(120:240, 4018, replace=TRUE),
                  year=seq(as.Date("2000-01-01"),as.Date("2010-12-31"), by="day"))
jay.sf
  • 60,139
  • 8
  • 53
  • 110