5

Recently, we were facing a very annoying issue with InfluxDB's GROUP BY time. It turns out that if we are trying to aggregate data for each 30d, InfluxDB aggregates our data by unexpected time buckets.

For example the following query:

SELECT COUNT(user_id) AS result FROM measurement1 WHERE time > '2017-12-31 23:59:59' AND time < '2019-01-01 23:59:59' GROUP BY time(30d) FILL(0);

Then we get the following response (epoch time in millis):

time                result
----                ------
1513728000000000000 0
1516320000000000000 0
1518912000000000000 0
1521504000000000000 0
1524096000000000000 0
1526688000000000000 0
1529280000000000000 0
1531872000000000000 0
1534464000000000000 4
1537056000000000000 1
1539648000000000000 0
1542240000000000000 0
1544832000000000000 0

Well, after converting the epoch time to normal dates, we find out that the returned intervals were at 20/12/17, 19/01/18 till the 15/12/18 (every 30 days).

As I understood, the aggregation points are pre-defined by influxdb according to the first value in time (GROUP BY time(value)). It was even mentioned in the documentation but on the much smaller scale - 'preset time boundary'. However, the examples dealt with minutes and 1-day aggregations, and it was easily fixed with the offset parameter because the default aggregation intervals of these scales were at midnight.

That's cool, but here we are dealing with multiple days. In our case, we cannot play with the offset parameter because we cannot know the time intervals returned by the GROUP BY.

Is there any source/ formula/algorithm or anything to help us in the prediction of these time intervals so we can offset them? If there's no such thing, then how can we overcome this issue?

I guess the reason for all this is performance, but it's very strange that this issue wasn't mentioned in their documentation because this is not an intuitive behavior.

Edit: I think I found out how influx determines these time intervals - it always starts from 0 epoch time. If that's true, then we can play with the offset as we like before shooting the query. I would expect this to be added to their documentation because that can save a lot of time for others + it would serve as a confirmation that no breaking changes will come in next releases.

Aladin
  • 492
  • 1
  • 8
  • 21
  • Similar problem here. I get confusing results because of the time buckets, and I don't know when the preset time boundaries start and therefore can't set the offset on the group by time. – bgusach Feb 27 '19 at 13:16
  • Please read my "Edit" - I found the answer for the time bucket calculation. We are already using that in production and it works (we get the expected time buckets). – Aladin Mar 03 '19 at 08:59

0 Answers0