3

*Before posting, I went through this post, but it did NOT work for Date format as I had in my data:

Using R & dplyr to summarize - group_by, count, mean, sd*

---------------------------------------------------------------------

What I have:

I have a data frame with two columns (i.e., "Date" and "Average") which contains the daily average precipitation for 5 years.

Here is head and tail of this data frame:

> head(years_nc)
    Date    Average
1 2010-01-01 0.00207909
2 2010-01-02 0.00207909
3 2010-01-03 0.00207909
4 2010-01-04 0.00207909
5 2010-01-05 0.00207909
6 2010-01-06 0.00207909

> tail(years_nc)
          Date     Average
3334271 2014-12-26 0.004983558
3334272 2014-12-27 0.004983558
3334273 2014-12-28 0.004983558
3334274 2014-12-29 0.004983558
3334275 2014-12-30 0.004983558
3334276 2014-12-31 0.004983558

To make things more clear, you could download this data frame:

https://www.dropbox.com/s/7wozzxvu6uckqsu/MyData.csv?dl=1

My Goal:

I am trying to make the mean of "Average" column for each year, separably.

This is my code to do so:

library(dplyr)
library(lubridate)

years_nc %>%
  group_by(Date) %>%
  summarize(avg_preci = mean(Average, na.rm = TRUE))

It returns only one value:

> 
   avg_preci
1 0.00195859

But I want R to:

(a) make a group for each year;

(b) then calculate the mean of Average precipitation in a yearly basis for me.

In other words, I must have 5 mean values; one value per year.

What is my mistake in the code?

Could anybody help me with this problem?

Thanks.

Canada2015
  • 187
  • 1
  • 12
  • 1
    You only missed `year(Date)` in `years_nc %>% group_by(year(Date)) %>% summarize(avg_preci = mean(Average, na.rm = TRUE))` – Vitali Avagyan Sep 28 '19 at 16:02
  • 1
    The code as written should at least give you an average for each date (though not each year). Is there any chance that you loaded the older plyr package along with dplyr? Try changing your call to `summarize()` to `dplyr::summarize()` and see if it now respects your date grouping. – jdobres Sep 28 '19 at 16:04
  • All of you are right. Employing your comments which are also reflected in the @deepseefan 's answer, are correct. Thanks. – Canada2015 Sep 29 '19 at 02:36

1 Answers1

4

You're almost in the right way. First ensure that your Date column is actually date. Then, when you do the grouping, do it by year only not by ymd which is in your dataframe. The script can be modified as follows.

years_nc$Date <- ymd(years_nc$Date)

years_nc %>%
  group_by(year(Date)) %>%
  summarize(avg_preci = mean(Average, na.rm = TRUE))
# #A tibble: 5 x 2
#     `year(Date)` avg_preci
#           <dbl>     <dbl>
# 1         2010   0.00196
# 2         2011   0.00196
# 3         2012   0.00196
# 4         2013   0.00196
# 5         2014   0.00196
deepseefan
  • 3,701
  • 3
  • 18
  • 31