2

I am trying to come up with a configuration that would enforce producer quota setup based on an average byte rate of producer. I did a test with a 3 node cluster. The topic however was created with 1 partition and 1 replication factor so that the producer_byte_rate can be measured only for 1 broker (the leader broker).

I set the producer_byte_rate to 20480 on client id test_producer_quota.

I used kafka-producer-perf-test to test out the throughput and throttle.

kafka-producer-perf-test --producer-props bootstrap.servers=SSL://kafka-broker1:6667 \
client.id=test_producer_quota \
--topic quota_test \
--producer.config /myfolder/client.properties \
--record.size 2048  --num-records 4000 --throughput -1

I expected the producer client to learn about the throttle and eventually smooth out the requests sent to the broker. Instead I noticed there is alternate throghput of 98 rec/sec and 21 recs/sec for a period of more than 30 seconds. During this time average latency slowly kept increseing and finally when it hits 120000 ms, I start to see Timeout exception as below

org.apache.kafka.common.errors.TimeoutException : Expiring 7 records for quota_test-0: 120000 ms has passed since batch creation.

What is possibly causing this issue?

  1. The producer is hitting timeout when latency reaches 120 seconds (default value of delivery.timeout.ms )
  2. Why isnt the producer not learning about the throttle and quota and slowing down or backing off What other producer configuration could help alleviate this timeout issue ?
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Balan
  • 71
  • 1
  • 8
  • 1
    Does this answer your question? [Kafka Producer TimeOutException](https://stackoverflow.com/questions/53223129/kafka-producer-timeoutexception) – Giorgos Myrianthous Mar 13 '20 at 10:47
  • `kafka-producer-perf-test` is typically used to stress test Kafka infrastructure and configuration. In that sense it's a "dumb" producer that is not supposed to learn or respect broker's back-pressure. – mazaneicha Mar 13 '20 at 15:21

1 Answers1

0

(2048 * 4000) / 20480 = 400 (sec)

This means that, if your producer is trying to send the 4000 records full speed ( which is the case because you set throughput to -1), then it might batch them and put them in the queue.. in maybe one or two seconds (depending on your CPU).

Then, thanks to your quota settings (20480), you can be sure that the broker won't 'complete' the processing of those 4000 records before at least 399 or 398 seconds.

The broker does not return an error when a client exceeds its quota, but instead attempts to slow the client down. The broker computes the amount of delay needed to bring a client under its quota and delays the response for that amount of time. 

Your request.timeout.ms being set to 120 seconds, you then have this timeoutException.

Yannick
  • 1,240
  • 2
  • 13
  • 25
  • This is not the behavior I am seeing though. The producer is able to send batches of 98 and 21 records alternating and the error appears only after completing 3600. Also default value of request timeout is 30 seconds, error happens after 2 minutes which is default for delivery timeout. While I agree that the perf producer may not have back pressure implemented , the producer client internal has the knowledge of broker channel being muted so my question is why there is no internal back pressure mechanism. Is there a reason that this needs to be handled by upstream ? – Balan Mar 14 '20 at 01:49