-1

I want to summarize the dataset based on "year", "months", and "subdist_id" columns. For each subdist_id, I want to get average values of "Rainfall" for the months 11,12,1,2 but for different years. For example, for subdist_id 81, the mean Rainfall value of 2004 will be the mean Rainfall of months 11, 12 of 2004, and months 1,2 of 2005.

I am getting no clue how to do it, although I searched online rigorously.

Sample image

camille
  • 16,432
  • 18
  • 38
  • 60
Evan
  • 25
  • 3
  • Can you upload reproducible data? You can run `dput("name_of_the_dataset")` and copy the output as a code to your question. – Bloxx Apr 03 '22 at 23:11
  • It's unclear what you're asking or what end result you want—a [reproducible example](https://stackoverflow.com/q/5963269/5325862) including what you've tried that didn't work would help, but whatever the issue is has likely been covered before. What you're describing seems like a rolling mean – camille Apr 04 '22 at 01:28

3 Answers3

2

Expanding on @Bloxx's answer and incorporating my comment:

# Set up example data frame:
df = data.frame(year=c(rep.int(2004,2),rep.int(2005,4)), 
                month=((0:5%%4)-2)%%12+1,
                Rainfall=seq(.5,by=0.15,length.out=6))

Now use mutate to create year2 variable:

df %>% mutate(year2 = year - (month<3)*1) # or similar depending on the problem specs

And now apply the groupby/summarise action:

df %>% mutate(year2 = year - (month<3)*1) %>% 
       group_by(year2) %>% 
       summarise(Rainfall = mean(Rainfall))
njp
  • 620
  • 1
  • 3
  • 16
0

Lets assume your dataset is called df. Is this what you are looking for?

df %>% group_by(subdist_id, year) %>% summarise(Rainfall = mean(Rainfall))
Bloxx
  • 1,495
  • 1
  • 9
  • 21
  • sorry for the confusion! The mean rainfall will depend on months that overlap in two years. For example, 2004 mean rainfall will be average of months 11,12 of 2004 and 1,2 of 2005. – Evan Apr 03 '22 at 23:32
  • What do you mean by overlap? – Bloxx Apr 03 '22 at 23:36
  • X year's mean rainfall should be the average of 11th and 12th months' rainfall of X year and 1st and 2nd months' rainfall of X+1 year. – Evan Apr 04 '22 at 00:02
  • Looks like you need to set up a new variable, say `year2` that is equal to the value of `year`-1 when `month` is 1 or 2 and then apply Bloxx's groupby/summarise operation. I often set up similar "water year" variables based on calendar year and month values. – njp Apr 04 '22 at 00:07
  • @njp that's the correct way I was looking for – Evan Apr 04 '22 at 04:38
0

I think you can simply do this:

df %>% filter(months %in% c(1,2,11,12)) %>%
  group_by(subdist_id, year=if_else(months %in% c(1,2),year-1,year)) %>% 
  summarize(meanRain = mean(Rainfall))

Output:

  subdist_id  year meanRain
       <dbl> <dbl>    <dbl>
1         81  2004    0.611
2         81  2005    0.228

Input:

df = data.frame(
  subdist_id = 81,
  year=c(2004,2004, 2005, 2005, 2005, 2005),
  months=c(11,12,1,2,11,12),
  Rainfall = c(.251,.333,.731,1.13,.111,.346)
)
langtang
  • 22,248
  • 1
  • 12
  • 27
  • This assumes that they'll only ever be interested in those 4 specific months, which seems unlikely – camille Apr 04 '22 at 01:30
  • true, but that's what the OP specifically stated. As in "For each subdist_id, I want to get average values of "Rainfall" for the months 11,12,1,2 but for different years." – langtang Apr 04 '22 at 01:32