Calculating row means for intervals defined in another data frame?

Question

I have a data frame with 2 variables: "time" and "temperature". The variable "time" has the following format: "%Y-%m-%d %H:%M:%S". I would like to calculate the average temperature for each day and night. Days and nights should be defined by sunrise and sunset times, which are stored in a second data frame. This means that each day and night have different starting and ending times.

So my question is: how can I calculate the average temperature for each day and night, days and nights being defined by sunrise and sunset times?

The first data frame which contains the temperatures looks like this:

time_temp_data <- data.frame(
  time = seq(
    as.POSIXct("2013-05-24 15:01:01"), 
    as.POSIXct("2013-06-02 03:31:01"), 
    by = "3 min"
    ), 
  temp = seq(7.153, 36.809, length.out = length(time))
  )

And the second data frame which contains sunrise and sunset times looks like that:

sunrise_sunset <- data.frame(
  event = rep(c("sunrise", "sunset"), 21),
  time = as.POSIXct(c("2013-05-18 03:59:01", "2013-05-18 22:07:01", "2013-05-   19 03:57:01", "2013-05-19 22:09:01", "2013-05-20 03:55:01",
                             "2013-05-20 22:11:01", "2013-05-21 03:53:01",  "2013-05-21 22:13:01", "2013-05-22 03:51:01", "2013-05-22 22:15:01",
                             "2013-05-23 03:49:01", "2013-05-23 22:18:01", "2013-05-24 03:47:01", "2013-05-24 22:20:01", "2013-05-25 03:45:01",
                             "2013-05-25 22:22:01", "2013-05-26 03:44:01", "2013-05-26 22:24:01", "2013-05-27 03:42:01", "2013-05-27 22:26:01", 
                             "2013-05-28 03:40:01", "2013-05-28 22:27:01", "2013-05-29 03:38:01", "2013-05-29 22:29:01", "2013-05-30 03:37:01",
                             "2013-05-30 22:31:01", "2013-05-31 03:35:01", "2013-05-31 22:33:01", "2013-06-01 03:34:01", "2013-06-01 22:35:01",
                             "2013-06-02 03:32:01", "2013-06-02 22:36:01", "2013-06-03 03:31:01", "2013-06-03 22:38:01", "2013-06-04 03:30:01",
                             "2013-06-04 22:40:01", "2013-06-05 03:29:01", "2013-06-05 22:41:01", "2013-06-06 03:28:01", "2013-06-06 22:42:01",
                             "2013-06-07 03:28:01", "2013-06-07 22:44:01"))

  )

One approach would be to merge the two data frames. However, the common variable of my two data frames ("time") does not have exactly the same data. The ideal would then be to merge the data frames using logical operators (≥, ≤) but I did not manage to do that.

EDIT
Question has been modified and is not considered too broad anymore. The example can be run out of the box.

See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Please provide an example of your data — Steve Bronder, Aug 12 '16 at 13:18
@Steve_Corrin Now that I edited my question and provided an example of my data, how can it be open again? — jvddorpe, Jan 19 '17 at 09:30

Richie Cotton · Answer 1 · 2016-08-12T14:01:25.627

This solution uses the dplyr package for manipulating data frames, lubridate for date-time manipulation, and magrittr for piping commands together.

library(dplyr)
library(lubridate)
library(magrittr)

Here's an example dataset:

time_temp_data <- data.frame(
  time = seq(
    as.POSIXct("2016-08-11"), 
    as.POSIXct("2016-08-12 23:00:00"),
    by = "1 hour",
  ),
  temp = rnorm(48)
)
sunrise_sunset_data <- data.frame(
  sunrise = as.POSIXct(c("2016-08-11 05:59:30", "2016-08-12 06:00:30")),
  sunset = as.POSIXct(c("2016-08-11 21:00:30", "2016-08-12 20:59:30"))
)

First we add columns ("mutate") to the datasets to splits the date-times into dates and times.

time_temp_data %<>%
  mutate_(
    date = ~ floor_date(time, "day"),
    time_of_day = ~  difftime(time, date, "hours")
  )

sunrise_sunset_data %<>%
  mutate_(
    date = ~ floor_date(sunrise, "day"),
    time_of_sunrise = ~ difftime(sunrise, date, "hours"),
    time_of_sunset = ~ difftime(sunset, date, "hours")
  )

Then we join the time/temp data to the sunrise/sunset data:

all_data <- inner_join(time_temp_data, sunrise_sunset_data, by = "date")

Night-time is when the time of day is after sunset, or before sunrise.

all_data %<>%
  mutate_(
    is_night = ~ time_of_day > time_of_sunset | time_of_day < time_of_sunrise
  )

Now the mean temperature for each date and day/night time can be calculated by grouping on these variables and calculating summary stats.

all_data %>%
  group_by_(~ date, ~ is_night) %>%
  summarize_(mean_temp = ~ mean(temp))

Calculating row means for intervals defined in another data frame?

1 Answers1