2

I am going to be using Athena for report generation on data available in S3. A lot of it is time series data coming from IoT devices.

Users can request reports over years and years' worth of data but will mostly be weekly, monthly or annual.

I am thinking to save aggregates every 15 minutes for ex: 12:00, 12:15, 12:30, 12:45, 1:00 etc. The calculated aggregates should always be at the full 15 mins and cannot be at 12:03 and 12:18 so on and so forth. Is it possible with Kinesis data analytics? If yes, how?

If not, does scheduling a lambda to be triggered every 5-10 minutes and having athena calculate those aggregates sound like a reasonable approach? Any alternatives I should consider?

systemdebt
  • 4,589
  • 10
  • 55
  • 116

1 Answers1

3

Kinesis Data Analytics runs Apache Flink which supports tumbling windows. The intervals starting from 00:00, 00:15, etc. should work by default by setting the window time to 15min.
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/operators/windows/#tumbling-windows

Since 15min is quite slow, you could also consider writing AWS Glue job (Apache Spark) and have it triggered periodically with built-in Glue triggers.

Or you can go with your current solution (Lambda/Athena).

One of the main decisions here would be how much do you need to invest to learn Spark or Flink vs. alredy known (I assume) Athena query. I would reserve some limited time for each approach to test them before picking one. This way you can quickly see where things get complicated.

bzu
  • 1,242
  • 1
  • 8
  • 14
  • 1
    Thanks. For tumbling window, my concern is how to do we control that we are writing out not for any window but rounded 15 mins window i.e 1:00 - 1:00 to 1:15, 1:15 to 1:30 not 1:13 to 1:28, etc. @tzu – systemdebt Aug 17 '22 at 23:24
  • 2
    This is the default behaviour: if you set window time to 15 min, it will always start at 00:00, 00:15 and so on (check the last paragraph of tumbling windows section). – bzu Aug 18 '22 at 08:25
  • @systemdebt Just out of curiosity: which solution did you choose in the end? – bzu Dec 04 '22 at 16:43