6

I would like to harness the speed and power of Apache Kafka to replace REST calls in my Java application.

My app has the following architecture:

Producer P1 writes a search query to topic search

Consumer C1 reads/consumes the search query and produces search results which it writes to another topic search_results.

Both Producer P1 and Consumer C1 are part of a group of producers/consumer living on different physical machines. I need the Producer P1 server to be the one to consume/read the search results output produced by Consumer C1 so it can serve the search results to the client who submitted the search query.

The above example was simplified for demonstration purposes - in reality the process entails several additional intermediate Producers and Consumers where the query may be thrown to/from multiple servers to be processed. The main point is that the value produced by the last Producer needs to be read/consumed by the first Producer.

In the typical Apache Kafka architecture, there's no way to ensure that the final output is read by the server that originally produced the search query - as there are multiple servers reading the same topic.

I do not want to use REST for this purpose because it is very sloooooow when processing thousands of queries. Apache Kafka can handle millions of queries with 10 millisecond latency. In my particular use case it is critical that the query is transmitted with sub-millisecond speed. Scaling with REST is also much more difficult. Suppose our traffic increases and we need to add a dozen more servers to intercept client queries. With Apache Kafka it's as simple as adding new servers and adding them to the Producer P1 group. With REST not so simple. Apache Kafka also provides a very high level of decoupling which REST does not.

What design/architecture can be used to force a specific server/produce to consume the end result of initial query?

Thanks

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • if you're looking at interprocess communication, do check out grpc where you can set up bi-directional streaming. – CruelEngine Mar 14 '22 at 04:55
  • 3
    Just a side question: have you asked yourself "exactly what's slow" in your REST implementation? Porting that latency to your Kafka-based implementation may be all too easy. – ernest_k Mar 14 '22 at 05:00
  • @ernest_k IMHO, HTTP is slooooooooooooooooooow, no matter how you dice it. In what REST implementation can you achieve sub milli second latency and prcoess millions of queries/messages per second? I would love to know if something like this exists out there.. – Levi Hibbitts Mar 14 '22 at 05:22
  • 2
    @LeviHibbitts That's a fair point. But you must be talking about the usual HTTP frameworks, which are typically designed to be bulky in ways that impede scale. To address your comment: I'm failing to see how a lean HTTP client/server architecture will fail to be equally fast (leaving aside native distributed computing capability). Beside that, the main reason for my comment is that it wouldn't matter whether one used REST or Kafka if the latency is introduced by the server-side code (slow database connections, sub-optimal implementation, etc.). – ernest_k Mar 14 '22 at 05:45
  • 2
    Kafka doesn't handle "queries". If you want to actually "search" something, have you considered Elasticsearch and its REST API rather than introduce async request-response or highly coupling your publish-subscribe mechanisms? – OneCricketeer Mar 14 '22 at 19:31

2 Answers2

2

In the typical Apache Kafka architecture, there's no way to ensure that the final output is read by the server that originally produced the search query - as there are multiple servers reading the same topic.

You can use custom partitioner in your producer that determines which search query to land in which partition.

Similarly, you can use custom partition assignor in consumer to determine which partitions should be assigned to which consumer. The consumer configuration is partition.assignment.strategy

The fact Kafka is faster than REST is due to the way it is implemented. What is important here is to decide which pattern works for you - request-response or publish-subscribe or something else. You can check this answer for REST vs Kafka.

JavaTechnical
  • 8,846
  • 8
  • 61
  • 97
  • Upvoted your response! Can you provide a short example of how one might implement this idea in Java? Would be super helpful! Thanks! – Bradford Griggs Mar 22 '22 at 21:51
0

Maybe it makes sense to have multiple topics for the answers, not just one big topic:

Use more than one topic

This way the "results" topics act as "mailboxes".

Probably you'll need to set auto.create.topics.enable=true since creating topics for all P1,...PN could be complicated.

Iñigo González
  • 3,735
  • 1
  • 11
  • 27