5

The time-series data is being produced in a kafka topic. I need to read each record and decorate with some data from the database and eventually call a REST API. Once the response is received, output to a kafka topic. How can I do this with kafka streams API efficiently and scalable?

Steps -

  • Start reading the input topic
  • Call mapvalues to make a database call and decorate the record with the additional data
  • Make a REST api call with the input request, get the response.
  • Output the record in the kafka topic

I think, there are two bottlenecks in the above algorithm -

  • Making a database calls would slow it down. This can be circumvented by caching the meta-data and load the meta-data when there is a mis or use state store.

  • Making the REST API call synchronously would slow it down.

    final KStream<String, String> records = builder.stream(InputTopic);

    //This is bad
    final KStream<String, String> output = records
      .mapValues(value ->  { //cache hit otherwise database call});
      .mapValues(value ->  { //prepare http request and convert the http resonse };
    output.to(OutputTopic)

The code above will have a dependency and adverse effect on the throughput if the database call or REST API takes longer time to complete. Records with the same key should not be processed out of order. The expected throughput is about 1m/minute. When one record reaches REST API, it is okay to make the database calls concurrently.

Not sure how to go about writing the topology which can scale in this scenario. I am new to kafka streams.

IceBurger
  • 155
  • 1
  • 8
Mac
  • 497
  • 5
  • 22
  • 2
    One recommended approach is to pull your database/API data into a KTable, then join the stream on some ID. – OneCricketeer Nov 12 '19 at 06:53
  • 1
    +1 stream the database into Kafka and do the join locally (c.f. https://rmoff.dev/codetalks19-no-more-silos-video) – Robin Moffatt Nov 12 '19 at 10:14
  • 3
    Have you read my answer at https://stackoverflow.com/a/49771142/1743580? – miguno Nov 12 '19 at 10:19
  • @MichaelG.Noll.. Thanks for reference. The answer talks about pros and cons. I was looking for different ways to solve the problem. – Mac Nov 13 '19 at 12:50
  • @RobinMoffatt.. Nice talk.. but does not help to solve my problem. – Mac Nov 13 '19 at 12:51
  • 1
    @Mac: Well. You got two "answers" -- import the data into a `KTable` or deal with the remote calls and corresponding tradeoffs. Not sure what else you expect to get as an answer? – Matthias J. Sax Nov 14 '19 at 02:43

0 Answers0