1

I have an initial event with some key. I want to understand, if there are no events with the same key happened within fixed time interval(let's assume 60 seconds) after initial event and do some actions in this case immediately. The first thought was to create KSQL table with WINDOW SESSION, something like:

SELECT
    COUNT(*) as total,
    COLLECT_LIST(ts) AS ts_list,
    field1 as f1,
    field2 as f2,
    WINDOWEND as window_end,
    WINDOWSTART as window_start
FROM events_source_topic
WINDOW SESSION (60 SECONDS)
WHERE field3 = 'some_condition_string'
GROUP BY
    field1,
    field2;

As a result I receive 2 messages in case if there are 2 events in window, because by default it reacts on every change of a window. I'm not interested in intermediate states of a window, so I've tried to use EMIT FINAL like

SELECT
    COUNT(*) as total,
    COLLECT_LIST(ts) AS ts_list,
    field1 as f1,
    field2 as f2,
    WINDOWEND as window_end,
    WINDOWSTART as window_start
FROM source_topic
WINDOW SESSION (60 SECONDS)
WHERE field3 = 'some_condition_string'
GROUP BY
    field1,
    field2
EMIT FINAL;

According to the documentation, I should get only one message when window was closed and I can analyse how many events were inside. Unfortunately I don't get this message immediately right after 60 seconds from the last event, but get it only after the first event for the new window(for the same partition I guess).

  1. I've found similar questions here and here and it seems that it was impossible to get message right after inactivity period of window, because KSQL windows are event-based, but not time-based. They were answered 2 years ago, are there any changes with it since then?
  2. Is there any other way to get an event after fixed period of time from the initial event, without organising scheduled/postponed calls on the client?

I've also tried to decrease GRACE PERIOD of a window, but it doesn't work as well

1 Answers1

-1

Yeah, there is a problem with ksqldb: https://github.com/confluentinc/ksql/issues/9760

Thought this was related to an issue I had, but its likely a limitation of how things work with event-streaming.

In short. Ksqldb doesn't have a way to know time has passed other than the timestamp of the last message (because wall-time isn't always related to stream time, your stream might be working with data 20 minutes ago. This is also explained in the documentation In short, you're looking at event-time, and wanting it to progress with the ingestion-time.

One workaround that's popularly used is to generate regular heartbeat events with the sole purpose of progressing your stream.

Other libraries, eg flink, use the concept of a watermark to trigger processing of windows, allowing you to create watermarks which progress your stream without actually having events in your stream, however to my knowledge, no such thing exists in ksqldb. For further reading, the flink documentation covers this topic in a fair amount of detail

Still curious to see what the ksqldb team has to say about this, given that this is something I'm also going to have to deal with in some of my projects.

Also the final thing from the issue

This is not currently on the ksqlDB road map

Which means for the time being the heartbeat solution would be helpful

Neenad
  • 861
  • 5
  • 19
  • A link to a solution is welcome, but please ensure your answer is useful without it: [add context around the link](//meta.stackexchange.com/a/8259) so your fellow users will have some idea what it is and why it is there, then quote the most relevant part of the page you are linking to in case the target page is unavailable. [Answers that are little more than a link may be deleted.](/help/deleted-answers) – Daniel Widdis Feb 23 '23 at 22:58