2

Debezium with outbox pattern

Setting the context:

  1. Using
  2. We wanted to use schema registry to store all event schemas for different business entities
  3. One topic can have multiple version of same schema
  4. One topic can have entirely different schema bounded by business context. Ex customerCreated, customerPhoneUpdated, customerAddressUpdated. (Using one the subject name strtegies)
  5. Wanted to verify if debezium supports point 2 and 3 (specially 3).

Imagine, I have two business event customerCreated and orderCreated and I wanted to store both into same topic “com.business.event”.

customerCreated

{ “id”:”244444” “name”:”test”, “address”: “test 123”, “email” : “test@test.com” }

orderCreated

{ “id”:”244444” “value”:”1234”, “address”: “test 123”, “phone” : “3333”, “deliverydate”: “10-12-19” }

Structure of my outbox table is as per below article

https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/

Column | Type | Modifiers --------------+------------------------+----------- id | uuid | not null aggregatetype | character varying(255) | not null aggregateid | character varying(255) | not null type | character varying(255) | not null payload | jsonb | not null

Now when I push my business event to above table it will store customerCreated and orderCreated event into the payload column as a String/JSON. If I push this to kafka in a topic “com.business.event” using debezium connector, it will produce the below message. (Printing with schema for example)

customerCreated.json

{ "schema": { "type":"struct", "fields":[ { "type":"string", "optional":false, "field":"eventType" }, { "type":"string", "optional":false, "name":"io.debezium.data.Json", "version":1, "field":"payload" } ], "optional":false }, "payload": { "eventType":"Customer Created", "payload":"{\"id\": \"2971baea-e5a0-46cb-b1b1-273eaf88246a\", \"name\": \"jitender\", \"email\": \"test\", \"address\": \"700 \"}}" } }

orderCreated.json

{
"schema":
    {
        "type":"struct",
        "fields":[
            {
                "type":"string",
                "optional":false,
                "field":"eventType"
            },
            {
                "type":"string",
                "optional":false,
                "name":"io.debezium.data.Json",
                "version":1,
                "field":"payload"
            }
        ],
        "optional":false
    },
"payload":
    {
        "eventType":"Order Created",
        "payload":"{\"id\": \"2971baea-e5a0-46cb-b1b1-273eaf88246a\", \"value\": \"123\",\"deliverydate\": \"10-12-19\",  \"address\": \"test\", \"phone\": \"700 \"}}"
    }

}

Problem:

As you can see in above examples schema in schema registry/kafka remains same though payload contains different business entities. Now when I as a consumer goes and tries to deserialise this message, I should know that payload can contain different structure based on the business event they are generated from. In this scenerio, I am not able to utilise schema registry fully as consumer should know all the business entities in advance.

Questions :

  1. What I wanted to do is that debezium should create two different schema’s under the same topic “com.business.event” using subject name strategy (example below). https://karengryg.io/2018/08/18/multi-schemas-in-one-kafka-topic/

Now as a consumer when I consume the message, my consumer will read the schema id from topic message and get it from schema registry and will decode the message directly with it. After decoding I can ignore the message if I am not interested in business event. By doing this I can have different schema’s under same topic using schema registry.

  1. Can I control the schema in kafka topic when I use debezium in conjunction with schema registry. Outbox table or outbox pattern is a must.

1 Answers1

3

please take a look at https://issues.jboss.org/browse/DBZ-1297 This is probably solution to your problem and questions as it aims to unwind the opaque string into a Kafka Connect. In this case you will have the schema exposed.

Would be good if you could try it for schema per subject name strategy.

Jiri Pechanec
  • 1,816
  • 7
  • 8