-1

I have a selection of scattered timestamp data based on requests to a particular service. This data covers approximately 3.5-4 years of requests against this service.

I am looking to turn this selection of variable-interval timestamps into a frequency-binned timeseries in R.

How would I go about converting these timestamps into a frequency-binned timeseries, such as "between 1 and 1:15PM on this day, there were 7 requests, and between 1:15 and 1:30PM there were 2, and between 1:30 and 1:45, there were 0", being sure to also have a bin where there is nothing?

The data is just a vector of timestamps from a database dump, all of the format: ""2014-02-17 13:10:46". Just a big ol' vector with ~2 million objects in it.

Sydney S.
  • 3
  • 2
  • 1
    Please share a sample of what your data looks like. Just share output of `dput(head(data))` in the question description. – tushaR Jan 08 '18 at 05:52
  • The data is literally just a vector of a bunch (and I mean a BUNCH) of timestamps pulled from our database. So it looks like a few million points of this format: "2014-02-17 13:10:46". – Sydney S. Jan 08 '18 at 06:02
  • This looks like this question: https://stackoverflow.com/questions/38339812/binning-time-data-in-r – Scipione Sarlo Jan 08 '18 at 09:47

1 Answers1

1

You could use tools for handling time series data from xts and zoo. Note that you will need some artificial 'data':

library(xts)
set.seed(42)
ts.index <- ISOdatetime(2018, 1, 8, 8:9, sample(60, 10), 0)
ts <- xts(rep(1, length(ts.index)), ts.index)
aggregate(ts, time(ts) - as.numeric(time(ts)) %% 900, length, regular = TRUE)
#>                      
#> 2018-01-08 08:15:00 1
#> 2018-01-08 08:30:00 3
#> 2018-01-08 08:45:00 1
#> 2018-01-08 09:00:00 1
#> 2018-01-08 09:15:00 1
#> 2018-01-08 09:45:00 3

Edit: If you want to include bins without observations, you can convert to a strictly regular ts object and replace the inserted NAvalues with zero:

raw <- aggregate(ts, time(ts) - as.numeric(time(ts)) %% 900, length, regular = TRUE)
as.xts(na.fill(as.ts(raw), 0), dateFormat = "POSIXct")
#>                     zoo(coredata(x), tt)
#> 2018-01-08 08:15:00                    1
#> 2018-01-08 08:30:00                    3
#> 2018-01-08 08:45:00                    1
#> 2018-01-08 09:00:00                    1
#> 2018-01-08 09:15:00                    1
#> 2018-01-08 09:30:00                    0
#> 2018-01-08 09:45:00                    3

Edit 2: It also works for the provided sample data:

library(xts)
data <- c(1228917812, 1245038910, 1245986979, 1268750482, 1281615510, 1292561113)
class(data) = c("POSIXct", "POSIXt")
attr(data, "tzone") <- "UTC"
dput(data)
#> structure(c(1228917812, 1245038910, 1245986979, 1268750482, 1281615510, 
#> 1292561113), class = c("POSIXct", "POSIXt"), tzone = "UTC")
ts <- xts(rep(1, length(data)), data)
raw <- aggregate(ts, time(ts) - as.numeric(time(ts)) %% 900, length, regular = TRUE)
head(as.xts(na.fill(as.ts(raw), 0), dateFormat = "POSIXct"))
#>                     zoo(coredata(x), tt)
#> 2008-12-10 15:00:00                    1
#> 2008-12-10 15:15:00                    0
#> 2008-12-10 15:30:00                    0
#> 2008-12-10 15:45:00                    0
#> 2008-12-10 16:00:00                    0
#> 2008-12-10 16:15:00                    0
Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
  • Is there a way to tell it to include "0" bins as well? – Sydney S. Jan 08 '18 at 13:01
  • When I try to run the aggregation, it fails with error: Error in aggregate.data.frame(as.data.frame(x), ...) : 'by' must be a list – Sydney S. Jan 08 '18 at 19:03
  • @HaroldSchreckengost Please provide a [minimal, reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) displaying the problem. – Ralf Stubner Jan 08 '18 at 19:25
  • When I run dput(head(ts)) this is the result: structure(c(1228917812, 1245038910, 1245986979, 1268750482, 1281615510, 1292561113), class = c("POSIXct", "POSIXt"), tzone = "UTC") If I run the command you gave above, it gives the mentioned error. – Sydney S. Jan 08 '18 at 21:32
  • @HaroldSchreckengost I cannot reproduce the error. See edited answer for my minimal example. Does my example work on your machine? If it does, you could extend it (probably by adding more data) until it shows the error. Hint: The `reprex` package makes it easy to produce minimal working examples. – Ralf Stubner Jan 08 '18 at 21:53