1

I use Kafka streams to process real-time data, in the Kafka streams tasks, I need to access MySQL to query data, and need to call another restful service.

All the operations are synchronous.

I'm afraid the sync call will reduce the process capability of the streams tasks.

Is this a good practice? or Is there any good idea to do this?

shangyin
  • 526
  • 1
  • 4
  • 14
NingLee
  • 1,477
  • 2
  • 17
  • 26
  • It depends on what are you trying to achieve and what operations are needed on DB level, however I'd suggest to have a look at Kafka JDBC connector so that you can stream data from MySQL to Kafka and then do whatever you want with Kafka Streams. – Giorgos Myrianthous Aug 21 '18 at 12:21
  • @GiorgosMyrianthous I edit the question to clearly describe my question, many thanks. – NingLee Aug 21 '18 at 12:33
  • What do you need to do with the data queried by MySQL? – Giorgos Myrianthous Aug 21 '18 at 12:34
  • For some old system, it's data is stored in MySQL. I need to query MySQL for some configurations and so on. – NingLee Aug 21 '18 at 12:37
  • 1
    As mentioned already: recommend is to use Kafka Connect to load the data into a topic, and read the data as a `KTable` (or maybe `GlobalKTable`) into your application. You can do lookups via joins. – Matthias J. Sax Aug 21 '18 at 16:20
  • @MatthiasJ.Sax Ok,I see. but it's good to call 3rd restful service in Kafka streams task? – NingLee Aug 22 '18 at 00:49
  • Calling external services can have impact on performance and is not recommended if it can't be avoided. But if there is no other way, it's not a deal breaker. The calls should be synchronous though -- for async calls, you might run into correctness issues and it's hard to write custom code to fix this. There is already a KIP to add support for this natively in Streams though: https://cwiki.apache.org/confluence/display/KAFKA/KIP-311%3A+Async+processing+with+dynamic+scheduling+in+Kafka+Streams – Matthias J. Sax Aug 22 '18 at 05:09
  • See also the question at https://stackoverflow.com/questions/49757709/kafka-streams-is-calling-rest-service-in-map-operator-considered-an-anti-pat/ – miguno Aug 22 '18 at 09:42

1 Answers1

2

A better way to do it would be to stream your MySQL table(s) into Kafka, and access the data there. This has the advantage of decoupling your streams app from the MySQL database. If you moved away from MySQL in the future, so long as the data were still written to the Kafka topic from wherever it subsequently lived, your streams app would be unaffected. If it's just configurations you're storing in MySQL, you could even adopt the pattern that some people use of using Kafka as the primary store for data (using log compaction, to retain it forever).

Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92