1

I have a dataframe with below data ( Average of the values of timestamp 7.50 and 7.40 should be my value of A for time Stamp 7.45)

Date_Time     |         A
7/28/2017 8:00|     443.75
7/28/2017 7:50|     440.75
7/28/2017 7:45|     NA
7/28/2017 7:40|     447.5
7/28/2017 7:30|     448.75
7/28/2017 7:20|     444.5
7/28/2017 7:15|     NA
7/28/2017 7:10|     440.25
7/28/2017 7:00|     447.5

I want it to transform into 15 min interval something like below using mean:

Date / Time   |    Object Value
7/28/2017 8:00|        465
7/28/2017 7:45|        464.875
7/28/2017 7:30|        464.75
7/28/2017 7:15|        464.875
7/28/2017 7:00|        465
Jaap
  • 81,064
  • 34
  • 182
  • 193
Tilo
  • 409
  • 1
  • 5
  • 14
  • do you take the value at `7:45` to be the average of values at `7:40` and `7:50`? – Aramis7d Sep 11 '17 at 05:29
  • Yes, I want average of 7.40 and 7.50 to be tagged as 7.45 – Tilo Sep 11 '17 at 05:31
  • I think @Aramis7d is asking you how do you determine the value on 7:45? – www Sep 11 '17 at 05:32
  • 1
    @Tilo I saw you just changed your example input data frame. This is a really bad practice when you changed your input data while others have already worked on your question. – www Sep 11 '17 at 05:51
  • @Tilo For your new example, `library(imputeTS) dt$A <- na.interpolation(dt$A)` will probably work. After that you can filter out your dataframe based on the 15 minutes interval. but I have not time to update my answer. Please provide clear, good problem statement, reproducible example, and desired output next time. – www Sep 11 '17 at 05:56

2 Answers2

1

Updat

The OP changes his or her desired output. Since I have no time to update my answer, I will leave my answer as it is. See my comment in the original post to see how to use na.interpolation to fill in the missing values.

Original Post

This solution assumes you calculated the average based on the average values in 8:00, 7:30, and 7:00.

library(dplyr)
library(tidyr)
library(lubridate)
library(imputeTS)


dt2 <- dt %>%
  mutate(Date.Time = mdy_hm(Date.Time)) %>%
  filter(Date.Time %in% seq(min(Date.Time), max(Date.Time), by = "15 min")) %>%
  complete(Date.Time = seq(min(Date.Time), max(Date.Time), by = "15 min")) %>%
  mutate(Object.Value = na.interpolation(Object.Value)) %>%
  fill(Object.Name) %>%
  arrange(desc(Date.Time))

dt2
# A tibble: 5 x 3
            Date.Time Object.Name Object.Value
               <dttm>       <chr>        <dbl>
1 2017-07-28 08:00:00           a      465.000
2 2017-07-28 07:45:00           a      464.875
3 2017-07-28 07:30:00           a      464.750
4 2017-07-28 07:15:00           a      464.875
5 2017-07-28 07:00:00           a      465.000

Data

dt <- read.table(text = "'Date Time' 'Object Name' 'Object Value'
'7/28/2017 8:00' a 465
                 '7/28/2017 7:50' a 465
                 '7/28/2017 7:40' a 464.75
                 '7/28/2017 7:30' a 464.75
                 '7/28/2017 7:20' a 464.75
                 '7/28/2017 7:10' a 465
                 '7/28/2017 7:00' a 465",
                 header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
0

If the values measured on the 10-minute intervals are time-integrated averages over that period, it's reasonable to average them to a different period. If these are instantaneous measurements, then it's more reasonable to smooth them as others have suggested.

To take time-integrated averages measured on the 10-minute schedule and average those to the 15-minute schedule, you can use the intervalaverage package:


library(data.table)
library(intervalaverage)
x <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:50", "7/28/2017 7:45", 
                             "7/28/2017 7:40", "7/28/2017 7:30", "7/28/2017 7:20", "7/28/2017 7:15", 
                             "7/28/2017 7:10", "7/28/2017 7:00"), A = c(443.75, 440.75, NA, 
                                                                        447.5, 448.75, 444.5, NA, 440.25, 447.5)), row.names = c(NA, 
                                                                                                                                 -9L), class = "data.frame")
y <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:45", "7/28/2017 7:30", 
                             "7/28/2017 7:15", "7/28/2017 7:00")), row.names = c(NA, -5L), class = "data.frame")

setDT(x)
setDT(y)

x
#>              time      A
#> 1: 7/28/2017 8:00 443.75
#> 2: 7/28/2017 7:50 440.75
#> 3: 7/28/2017 7:45     NA
#> 4: 7/28/2017 7:40 447.50
#> 5: 7/28/2017 7:30 448.75
#> 6: 7/28/2017 7:20 444.50
#> 7: 7/28/2017 7:15     NA
#> 8: 7/28/2017 7:10 440.25
#> 9: 7/28/2017 7:00 447.50
y
#>              time
#> 1: 7/28/2017 8:00
#> 2: 7/28/2017 7:45
#> 3: 7/28/2017 7:30
#> 4: 7/28/2017 7:15
#> 5: 7/28/2017 7:00

x[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")]
setnames(x, "time","start_time")
x[, start_time_integer:=as.integer(start_time)]

y[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")]
setnames(y, "time","start_time")
y[, start_time_integer:=as.integer(start_time)]

setkey(y, start_time)
setkey(x, start_time)


##drop time times at 15 and 45
x <- x[!start_time %in% as.POSIXct(c("2017-07-28 07:45:00","2017-07-28 07:15:00"),tz="UTC")]


x[, end_time_integer:=as.integer(start_time)+60L*10L-1L]
x[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")]


y[, end_time_integer:=as.integer(start_time)+60L*15L-1L]
y[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")]


x
#>             start_time      A start_time_integer end_time_integer
#> 1: 2017-07-28 07:00:00 447.50         1501225200       1501225799
#> 2: 2017-07-28 07:10:00 440.25         1501225800       1501226399
#> 3: 2017-07-28 07:20:00 444.50         1501226400       1501226999
#> 4: 2017-07-28 07:30:00 448.75         1501227000       1501227599
#> 5: 2017-07-28 07:40:00 447.50         1501227600       1501228199
#> 6: 2017-07-28 07:50:00 440.75         1501228200       1501228799
#> 7: 2017-07-28 08:00:00 443.75         1501228800       1501229399
#>               end_time
#> 1: 2017-07-28 07:09:59
#> 2: 2017-07-28 07:19:59
#> 3: 2017-07-28 07:29:59
#> 4: 2017-07-28 07:39:59
#> 5: 2017-07-28 07:49:59
#> 6: 2017-07-28 07:59:59
#> 7: 2017-07-28 08:09:59
y
#>             start_time start_time_integer end_time_integer            end_time
#> 1: 2017-07-28 07:00:00         1501225200       1501226099 2017-07-28 07:14:59
#> 2: 2017-07-28 07:15:00         1501226100       1501226999 2017-07-28 07:29:59
#> 3: 2017-07-28 07:30:00         1501227000       1501227899 2017-07-28 07:44:59
#> 4: 2017-07-28 07:45:00         1501227900       1501228799 2017-07-28 07:59:59
#> 5: 2017-07-28 08:00:00         1501228800       1501229699 2017-07-28 08:14:59
out <- intervalaverage(x,y,interval_vars=c("start_time_integer","end_time_integer"),value_vars="A")

out[, start_time:=as.POSIXct(start_time_integer,origin="1969-12-31 24:00:00",tz="UTC")]
out[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz="UTC")]
out[, list(start_time,end_time, A)]
#>             start_time            end_time        A
#> 1: 2017-07-28 07:00:00 2017-07-28 07:14:59 445.0833
#> 2: 2017-07-28 07:15:00 2017-07-28 07:29:59 443.0833
#> 3: 2017-07-28 07:30:00 2017-07-28 07:44:59 448.3333
#> 4: 2017-07-28 07:45:00 2017-07-28 07:59:59 443.0000
#> 5: 2017-07-28 08:00:00 2017-07-28 08:14:59       NA


#Note that this just equivalent to taking weighted.mean:

weighted.mean(c(447.5,440.25),w=c(10,5))
#> [1] 445.0833
weighted.mean(c(440.25,444.5),w=c(5,10))
#> [1] 443.0833
#etc

Note that the intervalaverage package requires integer columns defining closed intervals, hence the conversion to integer. integers are converted back to datetime (POSIXct) for readability.

Michael
  • 5,808
  • 4
  • 30
  • 39