2

We've configured an MSK (kafka) event source as the trigger for our Lambda function. Even though the offset lag is increasing the lambda concurrency is limited to 4-5 almost all the time as can be seen in the graph below. The configuration used for the MSK event source is:

Batch Size: 50
Batch window: 30 seconds
Number of partitions in the Kafka topic: 10

I made sure that the load is distributed equally across all the partitions. Is there anything I'm missing here which is causing the concurrency issue? Any solution is appreciated. Thanks in advance.

Offset Lag vs Concurrent Executions

pkgajulapalli
  • 1,066
  • 3
  • 20
  • 44
  • Is concurrency limit set on the function? If not, if you try to set it, what is the available amount shown? By any chance is Lambda running on a subnet that is having IP crunch? – Register Sole Feb 10 '23 at 02:11
  • @RegisterSole, no concurrency limit on the function. It has unreserved concurrency at the account level of 1000. There are other lambdas running without any resource crunch. And there are not many resources running in the subnet to have an IP crunch. – pkgajulapalli Feb 10 '23 at 16:33
  • Can you change the aggregation for ConcurrentExecutions to `max` ([official reference](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html))? What is the value after this change? It is strange that the number is a decimal. – Register Sole Feb 15 '23 at 01:52

2 Answers2

1

I think you are hitting the same limitation we found some months ago, this link led us in the right way (aka workaround in our case):

AWS MSK lambda concurrent consumers

It honestly makes sense that the partitions are not being used in all their capability because the jump from the msk EC2 setup to the lambda runtime is not something trivial. Maybe you can try other connectors.

https://docs.confluent.io/kafka-connectors/aws-lambda/current/overview.html#multiple-tasks

It also makes sense that bridging through Kinesis you would not have these specific issues as it is all Amazon native stuff.

1

Ideally concurrency should be a number with no decimals match the count of consumers count.

When you initially create an an Apache Kafka event source, Lambda allocates one consumer to process all partitions in the Kafka topic. Each consumer has multiple processors running in parallel to handle increased workloads. Additionally, Lambda automatically scales up or down the number of consumers, based on workload. To preserve message ordering in each partition, the maximum number of consumers is one consumer per partition in the topic.

source: https://docs.amazonaws.cn/en_us/lambda/latest/dg/with-kafka.html#services-kafka-scaling

The Offset Lag indicates performance issue for this blog gives better explanation Offset lag metric