0

I wanted to calculate the average temperature (t) of specific time period for each year.

I have weather data that gives me values for each day. My real data is from 2011-2019 and has all days in all years and I would like for example average temperature for 20th of April - 15th of May for each year.

Example data:

df <- data.frame(matrix(ncol = 4, nrow = 8))
x <- c("year", "month","day","t")
colnames(df) <- x
df$year <- c(2011,2011,2011,2011,2012,2012,2012,2012)
df$month <- c(3,3,4,4,3,3,4,4)
df$day <- c(1,2,3,4,1,2,3,4)
df$t <- c(1,3,6,1,2,7,1,-9)

I did managed to do this with a very ugly and time consuming code but lack of knowledge has stopped me in my tracks.

Thank you in advance.

  • 2
    Don't share data as images, use dput() instead to create a reproducible example. More ideas here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – s_baldur Mar 25 '20 at 11:24
  • Moreover, what did you actually try? – jay.sf Mar 25 '20 at 11:50
  • I did use as.date and making a column with "%Y-%m-%d" and selected each time period and took average of that manually and mixed that into a dateframe. I'm a beginner so this was both time consuming and very ugly code. I was searching for a better way. All help is welcomed. – Sölvi Vignisson Mar 25 '20 at 11:58

4 Answers4

3

With tidyverse you could do something similar:

library(tidyverse)

Data %>%
  filter((month == 4 & day >= 20) |
         (month == 5 & day <= 15)) %>%
  group_by(year) %>%
  summarise(mean_temp = mean(t))
Ben
  • 28,684
  • 5
  • 23
  • 45
2

Similar to @Ben's answer but in base R :

aggregate(t~year, subset(df, (month == 4 & day >= 20) | 
                             (month == 5 & day <= 15)), mean)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

you can actually add quite complex calculations to the group_by function in the dplyr package. Maybe you want to look into something like this.

library(dplyr)
library(lubridate)
df <- data.frame(matrix(ncol = 4, nrow = 8))

x <- c("year", "month","day","t")
colnames(df) <- x
df$year <- c(2011,2011,2011,2011,2012,2012,2012,2012)
df$month <- c(3,3,4,4,3,3,4,4)
df$day <- c(1,2,3,4,1,2,3,4)
df$t <- c(1,3,6,1,2,7,1,-9)
df %>% 
  group_by(lubridate::dmy(paste(day, month, year)) %>% 
             lubridate::yday() %>% 
             between(lubridate::yday(dmy("3.4.2000")), lubridate::yday(dmy("15.5.2000")))) %>% 
  summarise(mean(t)) 

I am using the yday function from lubridate to be able to select days over multiple years.

Hope this helps!!

Bertil Baron
  • 4,923
  • 1
  • 15
  • 24
0

Try the code bellow, I like to use for loop to deal with this kind of troble.

# Create a vector of all years
year_u <- unique(zz$year)

# Create the initial and final period
inicial_day <- 20
inicial_month <- 4

final_day <- 15
final_month <- 5

# Create an empty data.frame to store the data after each loop
averages <- data.frame()

# Open a loop
for(i in 1:length(year)){

    # take each year
    subsets <- subset(zz, year == year_u[i])

    # Mean of each time between the period
    average <- mean(subsets[subsets$day >= inicial_day & subsets$month >= inicial_month &
                                subsets$day <= final_day & subsets$month <= final_month, ]$t)

    # Create a temporary data.frame to store the year and the t_mean
    temp <- data.frame(year = year_u[i], t_mean = average)

    # Combine the actual data with the last
    averages <- rbind(averages, temp)
}