3

New R user here looking for guidance. I am working with a 15-minute data set and looking to parse out the following by one variable (buildings in my case) for each day of the year: (1) lowest mean of "value" for n consecutive rows (preferably 2 or 3 hours worth) (2) standard deviation of the same period

Sample df:

    variable    year month day hr min   date     value
    building_a  2018    6   2   0   0   6/2/2018    19
    building_a  2018    6   2   0   15  6/2/2018    19
    building_a  2018    6   2   0   30  6/2/2018    19
    building_a  2018    6   2   0   45  6/2/2018    17
    building_a  2018    6   2   1   0   6/2/2018    17
    building_a  2018    6   2   1   15  6/2/2018    15
    building_a  2018    6   2   1   30  6/2/2018    15
    building_a  2018    6   2   1   45  6/2/2018    14
    building_a  2018    6   2   2   0   6/2/2018    14
    building_a  2018    6   2   2   15  6/2/2018    13
    building_a  2018    6   2   2   30  6/2/2018    13
    building_a  2018    6   2   2   45  6/2/2018    13
    building_a  2018    6   2   3   0   6/2/2018    12
    building_a  2018    6   2   3   15  6/2/2018    14
    building_a  2018    6   2   3   30  6/2/2018    13
    building_a  2018    6   2   3   45  6/2/2018    13
    building_b  2018    6   2   0   0   6/2/2018    37
    building_b  2018    6   2   0   15  6/2/2018    41
    building_b  2018    6   2   0   30  6/2/2018    38
    building_b  2018    6   2   0   45  6/2/2018    39
    building_b  2018    6   2   1   0   6/2/2018    37
    building_b  2018    6   2   1   15  6/2/2018    36
    building_b  2018    6   2   1   30  6/2/2018    34
    building_b  2018    6   2   1   45  6/2/2018    34
    building_b  2018    6   2   2   0   6/2/2018    35
    building_b  2018    6   2   2   15  6/2/2018    35
    building_b  2018    6   2   2   30  6/2/2018    29
    building_b  2018    6   2   2   45  6/2/2018    32
    building_b  2018    6   2   3   0   6/2/2018    30
    building_b  2018    6   2   3   15  6/2/2018    33
    building_b  2018    6   2   3   30  6/2/2018    30
    building_b  2018    6   2   3   45  6/2/2018    32

I've been able to perform this for one-hour segments using the following approach, but cannot figure out how to adapt this to a larger window (e.g., lowest 135 minute mean instead of 60 min).

    tmp <- aggregate(value~variable+date+hour, df, 
                               function(x) 
                                   c(mean = mean(x), sd = sd(x)))

    tmp2 <- do.call("data.frame",tmp)
    tmp2$value.mean <- as.numeric(tmp2$value.mean)
    tmp2$value.sd <- as.numeric(tmp2$value.sd)

    tmp2_flat <- tmp2 %>%
      group_by(variable, date) %>%
      filter(value.mean == min(value.mean)) %>%
      arrange(variable, date, value.sd) %>%
      slice(1)

Thank you in advance for any advice

ajslo
  • 33
  • 5

1 Answers1

0

I played a little around and this is what I came up with:

UPDATE: The last answer wasn't very practicable. There was no feedback but I'm changing it nevertheless.

library(zoo)
library(dplyr)

df %>%
  group_by(variable, date) %>%
  mutate(minimum =  rollapply(value, width = 4, FUN = mean, fill = NA, align = "right"),
         sd = rollapply(value, width = 4, FUN = sd, fill = NA, align = "right")) %>%
  slice(which.min(minimum))

# A tibble: 2 x 10
# Groups:   variable, date [2]
  variable    year month   day    hr   min date     value minimum    sd
  <fct>      <int> <int> <int> <int> <int> <fct>    <int>   <dbl> <dbl>
1 building_a  2018     6     2     3     0 6/2/2018    12    12.8  0.5 
2 building_b  2018     6     2     2    30 6/2/2018    29    33.2  2.87

The idea remains the same however. In the rollapply() function one can specifiy the n of consecutive rows via as width= argument. 4 means in this case 4 * 15 minutes = 1 hour, but can be any number of quarter hours. And it calculates a "moving average" of valueat each row by looking back width rows.

That should do it I hope.

Humpelstielzchen
  • 6,126
  • 3
  • 14
  • 34
  • This wasn't working as expected at first, but after changing "slice(which.min(value))" to "slice(which.min(minimum))," I'm getting the results I had anticipated. Thank you so much! – ajslo Mar 18 '19 at 05:23
  • you're right. my bad.. I did some bad copypasta. I'm gonna edit it. – Humpelstielzchen Mar 18 '19 at 05:37