2

I am having real problems getting the AWS IoT Analytics Delta Window (docs) to work.

I am trying to set it up so that every day a query is run to get the last 1 hour of data only. According to the docs the schedule feature can be used to run the query using a cron expression (in my case every hour) and the delta window should restrict my query to only include records that are in the specified time window (in my case the last hour).

The SQL query I am running is simply SELECT * FROM dev_iot_analytics_datastore and if I don't include any delta window I get the records as expected. Unfortunately when I include a delta expression I get nothing (ever). I left the data accumulating for about 10 days now so there are a couple of million records in the database. Given that I was unsure what the optimal format would be I have included the following temporal fields in the entries:

datetime  :  2019-05-15T01:29:26.509
(A string formatted using ISO Local Date Time)

timestamp_sec  :  1557883766
(A unix epoch expressed in seconds)

timestamp_milli  :  1557883766509
(A unix epoch expressed in milliseconds)

There is also a value automatically added by AWS called __dt which is a uses the same format as my datetime except it seems to be accurate to within 1 day. i.e. All values entered within a given day have the same value (e.g. 2019-05-15 00:00:00.00)

I have tried a range of expressions (including the suggested AWS expression) from both standard SQL and Presto as I'm not sure which one is being used for this query. I know they use a subset of Presto for the analytics so it makes sense that they would use it for the delta but the docs simply say '... any valid SQL expression'.

Expressions I have tried so far with no luck:


from_unixtime(timestamp_sec)
from_unixtime(timestamp_milli)
cast(from_unixtime(unixtime_sec) as date)
cast(from_unixtime(unixtime_milli) as date)
date_format(from_unixtime(timestamp_sec), '%Y-%m-%dT%h:%i:%s')
date_format(from_unixtime(timestamp_milli), '%Y-%m-%dT%h:%i:%s')
from_iso8601_timestamp(datetime)

tarka
  • 5,289
  • 10
  • 51
  • 75

5 Answers5

2

What are the offset and time expression parameters that you are using?

Since delta windows are effectively filters inserted into your SQL, you can troubleshoot them by manually inserting the filter expression into your data set's query.

Namely, applying a delta window filter with -3 minute (negative) offset and 'from_unixtime(my_timestamp)' time expression to a 'SELECT my_field FROM my_datastore' query translates to an equivalent query:

SELECT my_field FROM
    (SELECT * FROM "my_datastore" WHERE
        (__dt between date_trunc('day', iota_latest_succeeded_schedule_time() - interval '1' day)
                  and date_trunc('day', iota_current_schedule_time() + interval '1' day)) AND
        iota_latest_succeeded_schedule_time() - interval '3' minute < from_unixtime(my_timestamp) AND
        from_unixtime(my_timestamp) <= iota_current_schedule_time() - interval '3' minute)

Try using a similar query (with no delta time filter) with correct values for offset and time expression and see what you get, The (_dt between ...) is just an optimization for limiting the scanned partitions. You can remove it for the purposes of troubleshooting.

Roger
  • 15,793
  • 4
  • 51
  • 73
  • Hi Roger, I did expect the delta to be constructed into an SQL query. This query results in the last 1 hour of data. `SELECT * FROM dev_iot_analytics_datastore WHERE __dt >= current_date - interval '1' day AND from_unixtime(timestamp) > current_timestamp - interval '1' hour`. However, this doesn't seem to work when added to the delta window expression. – tarka Jun 03 '19 at 10:42
1

Please try the following:

  1. Set query to SELECT * FROM dev_iot_analytics_datastore
  2. Data selection filter:
    • Data selection window: Delta time
    • Offset: -1 Hours
    • Timestamp expression: from_unixtime(timestamp_sec)
  3. Wait for dataset content to run for a bit, say 15 minutes or more.
  4. Check contents
Populus
  • 7,470
  • 3
  • 38
  • 54
0

After several weeks of testing and trying all the suggestions in this post along with many more it appears that the extremely technical answer was to 'switch off and back on'. I deleted the whole analytics stack and rebuild everything with different names and it now seems to now be working!

Its important that even though I have flagged this as the correct answer due to the actual resolution. Both the answers provided by @Populus and @Roger are correct had my deployment being functioning as expected.

tarka
  • 5,289
  • 10
  • 51
  • 75
0

I found by chance that changing SELECT * FROM datastore to SELECT id1, id2, ... FROM datastore solved the problem.

Edit 20/03:

I've discovered the root cause (at least in my case).

As it's subtil, let's try to explain :

The query is still : SELECT * FROM datastore and Delta Window based on : from_unixtime(timestamp/1000) Schedule - Cron expression - cron(0/1 * * * ? *)

The "timestamp" variable (in the Data Window) is defined during the IoT Core Rule SQL : SELECT *, timestamp() as timestamp ...

The misunderstanding is that, there is a DELAY between the IoT Core rule execution and the presence of the message in the datastore (could be a minute or so). Then the timestamp is when the message arrives at IoT Core and NOT when the message arrives IN the datastore.

The IoT Analytics SQL query is then based on the "timestamp". When this query is executed, it's looking on message timestamp between t = now() and t-1m = iota_latest_succeeded_schedule_time(). In this query, the message received in IoT Core COULD BE INCLUDED in this window BUT still not present in the datastore. The answer was empty and then I thought that delta window was buggy :/ This is not the case, the message is just on it way to the datastore.

In my case, the problem is often seen because the schedule frequency (1 minute) is around the same duration as the delay of message propagation. If the frequency is much higher (day for exemple) some message could be lost and very hard to understand why.

Workaround : I've created a Lambda in the IoT Analytics pipeline to add a new timestamp (ts_lambda) to the message, at the very end of the lambda execution (right before the message enters the datastore).

Then the Delta Window is now based on : from_unixtime(ts_lambda/1000)

Every thing is OK now, no more message lost.

Vincent
  • 1
  • 1
0

Using those functions in the SQL query solved the issues:

  • iota_current_schedule_time(): returns the timestamp of the current query.
  • iota_latest_succeeded_schedule_time(): returns the timestamp of the previous query that succeeded.

So, with the next query:

SELECT <field_1>, <field_2>, <...>
FROM <datastore_name>
WHERE from_unixtime(<timestamp_field>) BETWEEN
    iota_current_schedule_time() AND
    iota_latest_succeeded_schedule_time()

You are fetching data stored in the datastore between the previous succeeded query and the actual query. This is the same action as the Delta Window. The from_unixtime() function returns the same type as iota_current_schedule_time() and iota_latest_succeeded_schedule_time().

You can find more information about the timestamp type here: https://prestodb.io/docs/current/functions/datetime.html

amder
  • 1