Kafka stream enrichment - Sourcing a lookup table

Question

There is a Kafka stream component that fetches JSON data from a topic. Now I have to do the following:

Parse that input JSON data and fetch the value of a certain ID (identifier) attribute
Do a lookup against a particular table in Oracle database
Enrich that input JSON with data from the lookup table
Publish the enriched JSON data to another topic

What is the best design approach to achieve Step#2? I have a fair idea on how I can do the other steps. Any help is very much appreciated.

score 1 · Answer 1 · answered Aug 21 '18 at 17:08

Depending on the size of the dataset you're talking about, and of the volume of the stream, I'd try to cache the database as much as possible (assuming it doesn't change that often). Augmenting data by querying a database on every record is very expensive in terms of latency and performance.

The way I've done this before is instantiating a thread whose only task is to maintain a fresh local cache (usually a ConcurrentHashMap), and make that available to the process that requires it. In this case, you'll probably want to create a processor, give it the reference to the ConcurrentHashMap described above, and when the Kafka record comes in, lookup the data with the key, augment the record, and send it to either a Sink processor, or to another Streams processor, depending on what you want do with it.

In case the lookup fails, you can fallback to actually do a query on demand to the database, but you probably want to test and profile this, because in the worst case scenario of 100% cache misses, you're going to be querying the database a lot.

Kafka stream enrichment - Sourcing a lookup table

1 Answers1