6

So I have...

  • 1st topic that has general application logs (log4j). Stores things like HTTP API requests/responses and warnings, exceptions etc... There can be multiple logs associated to one logical business request. (These logs happen within seconds of each other)
  • 2nd topic contains commands from the above business request which other services take action on. (The commands also happen within seconds of each other, but maybe couple minutes from the original request)
  • 3rd topic contains events generated from actions of those other services. (Most events complete within seconds, but some can take up to 3-5 days to be received)

So a single logical business request can have multiple logs, commands and events associated to it by a uuid which the microservices pass to each other.

So what are some of the technologies/patterns that can be used to read the 3 topics and join them all together as a single json document and then dump them to lets say Elasticsearch?

Streaming?

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
user432024
  • 4,392
  • 8
  • 49
  • 85

3 Answers3

9

You can use Kafka Streams, or KSQL, to achieve this. Which one depends on your preference/experience with Java, and also the specifics of the joins you want to do.

KSQL is the SQL streaming engine for Apache Kafka, and with SQL alone you can declare stream processing applications against Kafka topics. You can filter, enrich, and aggregate topics. Currently only stream-table joins are supported. You can see an example in this article here

The Kafka Streams API is part of Apache Kafka, and a Java library that you can use to do stream processing of data in Apache Kafka. It is actually what KSQL is built on, and supports greater flexibility of processing, including stream-stream joins.

Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92
  • Ok so I'm guessing streaming is the way to go... But I need more info around the "how", based on timing requirements (explained in my question). – user432024 Mar 12 '18 at 14:20
  • Kafka persists data. So long as your retention period is enough to meet your business requirements, you'll be able to do the joins you want to. – Robin Moffatt Mar 12 '18 at 14:21
  • Ok, so business request logs a few lines now with seconds of each other and event is received 4 days later from 3rd party. So do I just set a window of say a few days? And will it mean that that the lot will only be processed when the last event comes in? – user432024 Mar 12 '18 at 14:23
  • 2
    Ok. So simpler solution which doesn't need streaming. I can simply read each topic and do upserts to the destination. I do not need to do streams to simply just join JSON. Further more with messages coming in as late as 4 days later makes windoing even harder. – user432024 Mar 16 '18 at 00:27
  • If you can only join 2 streams at a time or 1 table and 1 stream, combining 3 streams (say A, B and C) would need A joined with B to get AB, and then AB joined with C to get ABC. This is doable for a small number, but what if you need to combine a larger number, say 10 or more streams? Is this still the way to do this? – Kevin Hooke Dec 10 '19 at 22:42
3

You can use KSQL to join the streams.

  1. There are 2 constructs in KSQL Table/Stream.
  2. Currently, the Join is supported for a Stream & a table. So you need to identify the which is a good fit for what?
  3. You don't need windowing for joins.

Benefits of using KSQL.

  1. KSQL is easy to set up.
  2. KSQL is SQL language which helps you to query your data quickly.

Drawback.

  1. It's not production ready but in April-2018 the release is coming up.
  2. Its little buggy right now but certainly will improve in a few months.

Please have a look.

https://github.com/confluentinc/ksql

Zamir Arif
  • 341
  • 2
  • 13
  • I'm wondering if I even need streams. All I need is to make sure the documents are joined in Elasticsearch when they arrive in queue. Since messages can come in right away or a few days later it can maybe make it difficult with windowing? – user432024 Mar 20 '18 at 16:42
-2

Same as question Is it possible to use multiple left join in Confluent KSQL query? tried to join stream with more than 1 tables , if not then whats the solution? And seems like you can not have multiple join keys within same query.

Slim Bouguerra
  • 359
  • 1
  • 8