3

I'm currently using kafka streams to collate related events within a window. In case if all the related events don't arrive within a window, is there a way in Kafka streams where we get a handle to the events that are expired. This would assist in handling/ notifying the downstream application that all the related events didn't arrive for collation. Appreciate your response.
Below are the examples
Example-1:
- GroupID: g1
- Events arrival: E1,10am; E2 10:01am and E3 10:02am
- Window: Session Window of inactivity duration of 5 mins.
- Result: All the events are collated successfully.

Example-2:
- Events arrival: E1,10am; E2 10:01am and E3 don't arrive
- Window: Session Window of inactivity duration of 5 mins.
- Result: Trigger an action OR get notified via a listener for partial collation upon window expiry for E1 and E2 at 10:06 am

vinay
  • 101
  • 2
  • 5
  • The question is a little unclear. Are you asking "Is there a way in Kafka streams where we get a handle to the events that are expired?" – Nathan Mar 31 '17 at 14:57
  • Yes, is there a way to get the handle to the events onExpiration of the window. – vinay Mar 31 '17 at 16:33

2 Answers2

1

Windows in Kafka Streams "don't expire" but are kept open to allow the handling of late arriving data.

Compare How to send final kafka-streams aggregation result of a time windowed KTable?

It's not possible to register any call-back,

  • not for the case that "stream time" advances and passed "window end time"
  • not for the case that a window if finally dropped (ie, after retention period did pass)
Community
  • 1
  • 1
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Thanks Matt for the response. Is there an alternate way either via Kafka Streaming library or through interactive queries where we identify events when: a) When the window is dropped b) When the window time has elapsed. – vinay Apr 07 '17 at 05:40
  • You could use a dummy `transformValues` that just forward it's incoming data and register a punctuation schedule -- punctuation is based on internally tracked "stream time" and thus you can figure out if time passed beyond window-end time -- and if you put retention time into account you can also figure out when windows will be finally discarded. Thus, you could indirectly access the current window content via IQ like this -- but it's quite hacky and also not 100% precise. – Matthias J. Sax Apr 07 '17 at 17:06
  • @mat Windows in Kafka Streams do indeed expire when the window's "expiration"/defined "until" period passes and the state store eventually drops the window. So the notion that windows don't expired is a faulty one. From all the SO questions and forum topics asking about this it is clear to me that this is an oversight and a missing feature. Data doesn't live forever in the session store and we'd like to know what the value was as and when it's being expired. – akizl Oct 18 '17 at 01:23
  • I put "doesn't expire" in quotes -- my point was, that in contrast to other system that use triggers, Kafka Streams uses a different model. You are of course right, that it's not possible to maintain windows forever and that we apply a retention time. Thus, from my point of view, dropping a window is not a first class citizen and thus there is no API exposed. About to know what the value is: you get the latest update immediately, thus, you do know the value of expired windows. There is just no call back. Also, you can infer if a window should have been expired based on the progress... – Matthias J. Sax Oct 18 '17 at 15:50
  • on the progress of your applications. Window expiry is based on event time, and you thus you can estimate when a window gets expired based on record timestamp that you can access in your application. Of course, Apache Kafka is an open source project and new feature idea are very welcome. https://kafka.apache.org/contact and https://issues.apache.org/jira/projects/KAFKA – Matthias J. Sax Oct 18 '17 at 15:53
0

Have not tried it, but seems like window final results might do it https://kafka.apache.org/24/documentation/streams/developer-guide/dsl-api.html#window-final-results

The idea is to check if all events have arrived when the window closes and trigger some action if this is not the case.

Peter Dotchev
  • 2,980
  • 1
  • 25
  • 19