Convert timestamps to frequency-binned timeseries in R?

Question

I have a selection of scattered timestamp data based on requests to a particular service. This data covers approximately 3.5-4 years of requests against this service.

I am looking to turn this selection of variable-interval timestamps into a frequency-binned timeseries in R.

How would I go about converting these timestamps into a frequency-binned timeseries, such as "between 1 and 1:15PM on this day, there were 7 requests, and between 1:15 and 1:30PM there were 2, and between 1:30 and 1:45, there were 0", being sure to also have a bin where there is nothing?

The data is just a vector of timestamps from a database dump, all of the format: ""2014-02-17 13:10:46". Just a big ol' vector with ~2 million objects in it.

Please share a sample of what your data looks like. Just share output of `dput(head(data))` in the question description. — tushaR, Jan 08 '18 at 05:52
The data is literally just a vector of a bunch (and I mean a BUNCH) of timestamps pulled from our database. So it looks like a few million points of this format: "2014-02-17 13:10:46". — Sydney S., Jan 08 '18 at 06:02
This looks like this question: https://stackoverflow.com/questions/38339812/binning-time-data-in-r — Scipione Sarlo, Jan 08 '18 at 09:47

Ralf Stubner · Accepted Answer · 2018-01-08T21:50:31.063

You could use tools for handling time series data from xts and zoo. Note that you will need some artificial 'data':

library(xts)
set.seed(42)
ts.index <- ISOdatetime(2018, 1, 8, 8:9, sample(60, 10), 0)
ts <- xts(rep(1, length(ts.index)), ts.index)
aggregate(ts, time(ts) - as.numeric(time(ts)) %% 900, length, regular = TRUE)
#>                      
#> 2018-01-08 08:15:00 1
#> 2018-01-08 08:30:00 3
#> 2018-01-08 08:45:00 1
#> 2018-01-08 09:00:00 1
#> 2018-01-08 09:15:00 1
#> 2018-01-08 09:45:00 3

Edit: If you want to include bins without observations, you can convert to a strictly regular ts object and replace the inserted NAvalues with zero:

raw <- aggregate(ts, time(ts) - as.numeric(time(ts)) %% 900, length, regular = TRUE)
as.xts(na.fill(as.ts(raw), 0), dateFormat = "POSIXct")
#>                     zoo(coredata(x), tt)
#> 2018-01-08 08:15:00                    1
#> 2018-01-08 08:30:00                    3
#> 2018-01-08 08:45:00                    1
#> 2018-01-08 09:00:00                    1
#> 2018-01-08 09:15:00                    1
#> 2018-01-08 09:30:00                    0
#> 2018-01-08 09:45:00                    3

Edit 2: It also works for the provided sample data:

library(xts)
data <- c(1228917812, 1245038910, 1245986979, 1268750482, 1281615510, 1292561113)
class(data) = c("POSIXct", "POSIXt")
attr(data, "tzone") <- "UTC"
dput(data)
#> structure(c(1228917812, 1245038910, 1245986979, 1268750482, 1281615510, 
#> 1292561113), class = c("POSIXct", "POSIXt"), tzone = "UTC")
ts <- xts(rep(1, length(data)), data)
raw <- aggregate(ts, time(ts) - as.numeric(time(ts)) %% 900, length, regular = TRUE)
head(as.xts(na.fill(as.ts(raw), 0), dateFormat = "POSIXct"))
#>                     zoo(coredata(x), tt)
#> 2008-12-10 15:00:00                    1
#> 2008-12-10 15:15:00                    0
#> 2008-12-10 15:30:00                    0
#> 2008-12-10 15:45:00                    0
#> 2008-12-10 16:00:00                    0
#> 2008-12-10 16:15:00                    0

When I try to run the aggregation, it fails with error: Error in aggregate.data.frame(as.data.frame(x), ...) : 'by' must be a list — Sydney S., Jan 08 '18 at 19:03
@HaroldSchreckengost Please provide a [minimal, reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) displaying the problem. — Ralf Stubner, Jan 08 '18 at 19:25
When I run dput(head(ts)) this is the result: structure(c(1228917812, 1245038910, 1245986979, 1268750482, 1281615510, 1292561113), class = c("POSIXct", "POSIXt"), tzone = "UTC") If I run the command you gave above, it gives the mentioned error. — Sydney S., Jan 08 '18 at 21:32
@HaroldSchreckengost I cannot reproduce the error. See edited answer for my minimal example. Does my example work on your machine? If it does, you could extend it (probably by adding more data) until it shows the error. Hint: The `reprex` package makes it easy to produce minimal working examples. — Ralf Stubner, Jan 08 '18 at 21:53

Convert timestamps to frequency-binned timeseries in R?

1 Answers1