0

When I run a changefeed into Kafka, it emits messages for a while, but then gets stuck. In the job status or logs I see the error kafka server: Message was too large, server rejected it to avoid allocation error.

What does this mean and how do I fix it?

histocrat
  • 2,291
  • 12
  • 21

1 Answers1

0

Changefeeds emit a message per row, and the size of an individual message is proportional to the size of the database row that changed. If your maximum row length is large enough, a batch, or even an individual message, might be larger than your Kafka server is configured to support. This most often happens with a jsonb column. This will eventually block the changefeed from making progress on ranges containing these large rows, and may even lead to repeated retries that lead to large volumes of duplicates downstream of the smaller messages that were batched with the larger one.

The simplest solution, if possible, is to increase Kafka's maximum message size. This answer will tell you how to adjust broker configuration.

If you aren't able to adjust Kafka's broker settings, you can configure some client-side settings using kafka_sink_config, as documented here. By default, Kafka changefeeds are configured to minimize message size, but will batch messages if they're coming in faster than they can be sent out. Disabling batching entirely may therefore prevent this issue.

On recent versions of CockroachDB, executing SET CLUSTER SETTING changefeed.batch_reduction_retry_enabled = true in SQL will enable experimental behavior to reduce batch sizes in response to the error, which may solve the problem as long as no individual row is too big. Existing changefeeds will need to be paused and resumed to pick up the setting. Future versions of CockroachDB will have this behavior on by default.

A last resort would be to remove or compress the offending rows in the table, then start up a copy of the changefeed. Note that simply resuming the existing feed, or starting one with a cursor from before the rows were cured, won't work as the feed will still try to emit the old version of the row.

histocrat
  • 2,291
  • 12
  • 21