1

I have a per minute timeseries for a number of years.

I need to compute a the following value for each minute data point:

q <- (Fn-Fd)/Fn

Whereby Fn is the average F value at night time between 12-1 AM and Fd is just the minute data point.

Now obviously the Fn changes each day so one approach would be to calculate Fn perhaps using a dplyr function and i would need to create a loop of some kind or re-organise my data frame...

dummy data:

#string of dates for a one month
datetime <- seq(
     from=as.POSIXct("2012-1-1 0:00:00", tz="UTC"),
     to=as.POSIXct("2012-2-1 0:00:00", tz="UTC"),
     by="min"
) 

#variable F
F <- runif(44641, min = 0, max =2)

#dataframe
df <- as.data.frame(cbind(datetime,F))
library(lubridate)
#make sure its in  "POSIXct" "POSIXt" format
df$datetime <- as_datetime(df$datetime)

Or a less elegant way might be to get Fn on its own, between the times using dplyr first - i think it will be something like this:

Fn <- df %>% 
  filter(between(as.numeric(format(datetime, "%H")), 0, 1)) %>% 
  group_by(hour=format(datetime, "%Y-%m-%d %H:")) %>%
  summarise(value=mean(df$F))

But I am not sure my syntax is correct there? Am I calculating the mean F between 12 and 1 AM per day?

Then i could just print the average Fn value for each min per day to my dataframe and do the simple calculation to get Q.

Thanks in advance for advice here.

Lmm
  • 403
  • 1
  • 6
  • 24

1 Answers1

1

Maybe something like this ?

library(dplyr)
library(lubridate)

df %>%
  group_by(Date = as.Date(datetime)) %>%
  mutate(F_mean = mean(F[hour(datetime) == 0]), 
         value = (F_mean - F)/F_mean) %>%
  ungroup() %>%
  select(-F_mean, -Date)


#             datetime     F  value
#   <dttm>              <dbl>  <dbl>
# 1 2012-01-01 00:00:00 1.97  -0.902
# 2 2012-01-01 00:01:00 0.194  0.813
# 3 2012-01-01 00:02:00 1.52  -0.467
# 4 2012-01-01 00:03:00 1.66  -0.599
# 5 2012-01-01 00:04:00 0.765  0.262
# 6 2012-01-01 00:05:00 1.31  -0.267
# 7 2012-01-01 00:06:00 1.62  -0.565
# 8 2012-01-01 00:07:00 0.642  0.380
# 9 2012-01-01 00:08:00 1.62  -0.560
#10 2012-01-01 00:09:00 1.68  -0.621
# ... with 44,631 more rows

We first group_by every date get the mean value for 0th hour (values between 00:00 to 00:59) each day and calculate value using the formula given.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • That's great - thank you! Just to clarify on the breakdown of the syntax for my own better understanding - what does the last aspect of the code do - the ungroup() and select()? - thanks for your help – Lmm Jan 15 '19 at 22:58
  • 1
    @Lmm Since we are doing grouping it by `Date`, the grouping would still remain in the dataframe if we don't `ungroup()` it and `select` with `-` sign drops the additional columns which we don't need. – Ronak Shah Jan 15 '19 at 23:37
  • Great thank you. Also can we include an na.rm into the mean calculation? e.g. mutate(F_mean = mean(F[hour(datetime) == 0], na.rm = T) would that work? – Lmm Jan 15 '19 at 23:43
  • @Lmm yes, you can include that and it will remove all the `NA` values. – Ronak Shah Jan 15 '19 at 23:48
  • One issue i have literally just come across is this seems to be working in UTC time, i am trying to work in Cad mountain time - is there anyway I can specify my time zone within this function? As currently my F_mean appears at 6pm on the previous day. – Lmm Jan 16 '19 at 00:15
  • 1
    you can change the default timezone using https://stackoverflow.com/questions/6374874/how-to-change-the-default-time-zone-in-r – Ronak Shah Jan 16 '19 at 00:58
  • Hmmm, that still doesnt seem to work - my date time within my dataframe is Canada/mountain and i have used Sys.setenv(TZ= "Canada/Mountain") but from the mutate function (F_mean = mean(F[hour(datetime) == 0]) is still considering 18:00 - which would be 0 in UTC as the hour to average? Could i specify within the mutate function? ... also really, thank you for all the help ;) – Lmm Jan 16 '19 at 18:29