How to calculate average of values from specific days each year for multiple years in R?

Question

I wanted to calculate the average temperature (t) of specific time period for each year.

I have weather data that gives me values for each day. My real data is from 2011-2019 and has all days in all years and I would like for example average temperature for 20th of April - 15th of May for each year.

Example data:

df <- data.frame(matrix(ncol = 4, nrow = 8))
x <- c("year", "month","day","t")
colnames(df) <- x
df$year <- c(2011,2011,2011,2011,2012,2012,2012,2012)
df$month <- c(3,3,4,4,3,3,4,4)
df$day <- c(1,2,3,4,1,2,3,4)
df$t <- c(1,3,6,1,2,7,1,-9)

I did managed to do this with a very ugly and time consuming code but lack of knowledge has stopped me in my tracks.

Thank you in advance.

Don't share data as images, use dput() instead to create a reproducible example. More ideas here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — s_baldur, Mar 25 '20 at 11:24
I did use as.date and making a column with "%Y-%m-%d" and selected each time period and took average of that manually and mixed that into a dateframe. I'm a beginner so this was both time consuming and very ugly code. I was searching for a better way. All help is welcomed. — Sölvi Vignisson, Mar 25 '20 at 11:58

score 3 · Answer 1 · answered Mar 25 '20 at 12:10

3

With tidyverse you could do something similar:

library(tidyverse)

Data %>%
  filter((month == 4 & day >= 20) |
         (month == 5 & day <= 15)) %>%
  group_by(year) %>%
  summarise(mean_temp = mean(t))

answered Mar 25 '20 at 12:10

Ben

28,684
5
23
45

score 2 · Answer 2 · answered Mar 25 '20 at 13:47

2

Similar to @Ben's answer but in base R :

aggregate(t~year, subset(df, (month == 4 & day >= 20) | 
                             (month == 5 & day <= 15)), mean)

answered Mar 25 '20 at 13:47

Ronak Shah

377,200
20
156
213

score 1 · Answer 3 · answered Mar 25 '20 at 12:55

you can actually add quite complex calculations to the group_by function in the dplyr package. Maybe you want to look into something like this.

library(dplyr)
library(lubridate)
df <- data.frame(matrix(ncol = 4, nrow = 8))

x <- c("year", "month","day","t")
colnames(df) <- x
df$year <- c(2011,2011,2011,2011,2012,2012,2012,2012)
df$month <- c(3,3,4,4,3,3,4,4)
df$day <- c(1,2,3,4,1,2,3,4)
df$t <- c(1,3,6,1,2,7,1,-9)
df %>% 
  group_by(lubridate::dmy(paste(day, month, year)) %>% 
             lubridate::yday() %>% 
             between(lubridate::yday(dmy("3.4.2000")), lubridate::yday(dmy("15.5.2000")))) %>% 
  summarise(mean(t))

I am using the yday function from lubridate to be able to select days over multiple years.

Hope this helps!!

score 0 · Answer 4 · answered Mar 25 '20 at 12:08

Try the code bellow, I like to use for loop to deal with this kind of troble.

# Create a vector of all years
year_u <- unique(zz$year)

# Create the initial and final period
inicial_day <- 20
inicial_month <- 4

final_day <- 15
final_month <- 5

# Create an empty data.frame to store the data after each loop
averages <- data.frame()

# Open a loop
for(i in 1:length(year)){

    # take each year
    subsets <- subset(zz, year == year_u[i])

    # Mean of each time between the period
    average <- mean(subsets[subsets$day >= inicial_day & subsets$month >= inicial_month &
                                subsets$day <= final_day & subsets$month <= final_month, ]$t)

    # Create a temporary data.frame to store the year and the t_mean
    temp <- data.frame(year = year_u[i], t_mean = average)

    # Combine the actual data with the last
    averages <- rbind(averages, temp)
}

In the end, the data.frame averages will have the years and the t_means of the periods. — Leonardo Donato Nunes, Mar 25 '20 at 12:09

How to calculate average of values from specific days each year for multiple years in R?

4 Answers4