4

I have a dataset of depth over time (a 2 month period). The minimum time interval between depths is less than 1-min and the maximum is several days. in R, I want to calculate a moving average of depth based on a 6-hour (or 12-hour) time window around each observation (not a window based on lagging/leading number of observations).

I have tried the zoo package, but I couldn't seem to get rollmean to work for me.

A small subset of my data is: https://www.dropbox.com/s/lhhrdgt2mxasc9v/fid57.depth.test1.csv

And in R it looks like:

> str(my.data)
'data.frame':   51 obs. of  2 variables:
 $ DateTime: POSIXct, format: "2013-08-07 06:49:46" "2013-08-07 06:55:17" "2013-08-07" 07:06:52" "2013-08-07 07:23:43" ...
 $ Depth   : num  28.6 31.7 29 35.2 33 ...

 >head(my.data)
DateTime "Depth"
2013-08-07 06:49:46 28.58
2013-08-07 06:55:17 31.7
2013-08-07 07:06:52 29.02
2013-08-07 07:23:43 35.18
2013-08-07 07:27:14 32.98
2013-08-07 08:20:21 55.84

> dput(head(my.data))
structure(list(DateTime = structure(c(1375883386, 1375883717, 
1375884412, 1375885423, 1375885634, 1375888821), class = c("POSIXct", 
"POSIXt"), tzone = ""), Depth = c(28.58, 31.7, 29.02, 35.18, 
32.98, 55.84)), .Names = c("DateTime", "Depth"), row.names = c(8481L, 
8483L, 8484L, 8485L, 8487L, 8495L), class = "data.frame")

Any suggestions would be much appreciated.
Thank you in advance!

madeckmann
  • 61
  • 1
  • 1
  • 5
  • Please add the output from `dput(head(my.data))` to the question. – Matthew Lundberg Jan 20 '14 at 19:57
  • I don't think `rollapply` will work here due to the changing window size, but since you're looking to calculate the mean you should be able to come up with fast algorithms that track the rolling six hour window. See [this discussion](http://stackoverflow.com/questions/21062927/need-faster-rolling-apply-function-with-start-to-stop-indices/21071229#21071229). – BrodieG Jan 20 '14 at 19:57
  • Would this (http://stackoverflow.com/questions/20134823/r-faster-way-to-calculate-rolling-statistics-over-a-variable-interval/20137464#20137464) answer your question? I think it might. It should give you your answer in a few milliseconds or less. The wordings of your questions are quite different, but I think the two questions are essentially asking the same thing. – kdauria Jan 29 '14 at 16:41

2 Answers2

2

This is not what you asked for but just in case simply cutting it up into 6 hour non-overlapping intervals and averaging within those is sufficient then here is some code:

library(zoo)

z <- read.zoo("fid57.depth.test1.csv", header = TRUE, 
              index = 1:2, format = "%Y-%m-%d %H:%M:%S", tz = "")
z6 <- aggregate(xx, as.POSIXct(cut(time(z), "6 hours")), mean)

giving this:

> z6
2013-08-07 06:00:00 2013-08-07 12:00:00 2013-08-07 18:00:00 2013-08-08 00:00:00 
           43.40810            39.13500            22.31250            17.38333 
2013-08-08 06:00:00 2013-08-08 12:00:00 2013-08-08 18:00:00 2013-08-09 00:00:00 
                 NA            15.53333                  NA                  NA 
2013-08-09 06:00:00 2013-08-09 12:00:00 
                 NA            23.30455 

If the NA entries are not wanted use na.omit(z6).

Also note that the input file has a .csv extension but is not a csv file.

The data used in the example above was this:

"DateTime ""Depth"""
2013-08-07 06:49:46 28.58
2013-08-07 06:55:17 31.7
2013-08-07 07:06:52 29.02
2013-08-07 07:23:43 35.18
2013-08-07 07:27:14 32.98
2013-08-07 08:20:21 55.84
2013-08-07 09:05:35 47.05
2013-08-07 09:10:28 65.96
2013-08-07 09:37:21 40.01
2013-08-07 09:44:59 47.05
2013-08-07 09:58:30 43.53
2013-08-07 10:02:45 47.49
2013-08-07 10:07:23 47.93
2013-08-07 10:11:31 56.28
2013-08-07 10:15:38 61.12
2013-08-07 10:19:39 53.2
2013-08-07 10:27:28 43.53
2013-08-07 10:31:44 40.89
2013-08-07 10:45:19 31.2
2013-08-07 10:47:29 31.7
2013-08-07 10:49:44 41.33
2013-08-07 12:01:00 33.86
2013-08-07 12:05:06 35.62
2013-08-07 17:25:35 43.53
2013-08-07 17:40:25 43.53
2013-08-07 18:15:03 42.65
2013-08-07 21:29:33 16.3
2013-08-07 22:05:15 14.9
2013-08-07 22:07:44 15.4
2013-08-08 02:18:36 16.3
2013-08-08 02:23:26 16.3
2013-08-08 03:34:21 16.3
2013-08-08 03:55:46 16.7
2013-08-08 05:05:53 17.6
2013-08-08 05:10:27 21.1
2013-08-08 15:36:02 16.7
2013-08-08 16:12:20 12.8
2013-08-08 16:16:55 17.1
2013-08-09 13:17:04 22.4
2013-08-09 13:22:32 21.1
2013-08-09 13:25:58 24.2
2013-08-09 13:37:01 15.4
2013-08-09 13:40:16 14.1
2013-08-09 13:46:46 14.1
2013-08-09 13:54:31 27.26
2013-08-09 14:18:53 40.89
2013-08-09 14:22:34 21.5
2013-08-09 14:26:52 28.14
2013-08-09 14:36:35 27.26
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
javlacalle
  • 1,029
  • 7
  • 15
0

I recommend using runner package and built-in function mean_run. Your issue is desribed in section Window depending on date in vignette. Below example of 7 days average.

# random value and irregular data generation
x <- runif(15)
date <- as.Date(cumsum(rpois(n = 15, lambda = 2)), origin = Sys.Date())

library(runner)
mean_run(x, k = 7, idx = date)
GoGonzo
  • 2,637
  • 1
  • 18
  • 25