0

In my prod environment, I got this inconsistent NPE when consuming kafka record. It only happened once and only happened after startup. It does not always happen and from multiple pods only a single pod gives this issue at startup.

The error triggered around 3 seconds after spring logged the Started (myapplication) in xx seconds message and it only happened one time at that single deployment. Then when I deploy again, it happen again.

I don't think this is a configuration problem since it works fine after that single exception. I am not sure if the record re-processed after that, but I assume it is since it is not commited.

The root cause exception:

Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.Boolean.booleanValue()" because the return value of "java.util.Map.get(Object)" is null
    at org.springframework.kafka.listener.adapter.DelegatingInvocableHandler.invoke(DelegatingInvocableHandler.java:204)

The thrown exception:

org.springframework.kafka.KafkaException: Seek to current after exception; nested exception is org.springframework.kafka.listener.ListenerExecutionFailedException: Listener method 'public default void com.lightspeed.hospitality.orderpay.app.kafka.consumers.GenericConsumer.onRawEvent(org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.String, byte[]>)' threw exception; nested exception is java.lang.NullPointerException: Cannot invoke "java.lang.Boolean.booleanValue()" because the return value of "java.util.Map.get(Object)" is null; nested exception is java.lang.NullPointerException: Cannot invoke "java.lang.Boolean.booleanValue()" because the return value of "java.util.Map.get(Object)" is null
    at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:208)

For my consumer I use: @KafkaListener at my class and the method I use:

@KafkaHandler(isDefault = true)
void onRawEvent(ConsumerRecord<String, byte[]> consumerRecord) 

Does anyone get an idea of where to investigate? I checked spring-kafka code but going nowhere so far.

The issue caused by: return new InvocationResult(result, replyTo, this.handlerReturnsMessage.get(handler));

when it execute this.handlerReturnsMessage.get(handler) it returns null thus it cannot unbox to the primitive boolean. But I still have no clue why it happened or if it is a bug.

What I use:

  • spring-boot 2.7.0
  • spring-kafka 2.8.6
  • kafka-clients 3.0.1
rmrt
  • 63
  • 1
  • 4
  • Does this answer your question? [What is a NullPointerException, and how do I fix it?](https://stackoverflow.com/questions/218384/what-is-a-nullpointerexception-and-how-do-i-fix-it) – tgdavies Jul 04 '23 at 08:17
  • What's the source code of `public default void com.lightspeed.hospitality.orderpay.app.kafka.consumers.GenericConsumer.onRawEvent(org.apache.kafka.clients.consumer.ConsumerRecord)`? What's `handlerReturnsMessage` and how are items added to it? – tgdavies Jul 04 '23 at 08:19
  • `handlerReturnsMessage` is the spring-kafka code. Refer [to this code](https://github.com/spring-projects/spring-kafka/blob/v2.8.6/spring-kafka/src/main/java/org/springframework/kafka/listener/adapter/DelegatingInvocableHandler.java#L204). The `onRawEvent` code doesnt really matter since its not even reach that part yet, it already fails at spring-kafka `invoke` code (it cant find the handler method). But again, this is only happen once (always after new deployment, but not at all pods, and not always occur at every new deploy), the next seek (I guesss the same offset record) works properly. – rmrt Jul 04 '23 at 09:04
  • @tgdavies for the first comment -> no, because my question is more toward spring-kafka behavior. Not the NPE itself. I know the cause, but I don't know why spring-kafka handler map can return null. – rmrt Jul 04 '23 at 09:08
  • Well there's a race in `getHandlerForPayload` where one thread has done `this.cachedHandlers.putIfAbsent(payloadClass, handler)` but `setupReplyTo(handler)` hasn't completed, and the second thread sees that `this.cachedHandlers.get(payloadClass)` is non null. Perhaps that's the problem. – tgdavies Jul 04 '23 at 09:24
  • Can elaborate more? The `setupReplyTo` executed after `putIfAbsent` and it doesnt seems to be async until it finish the `this.handlerReturnsMessage.put` And that executed at `InvocableHandlerMethod handler = getHandlerForPayload(payloadClass);` right? The `cachedHandlers` is just a ConcurrentMap – rmrt Jul 04 '23 at 13:40

1 Answers1

0

This is one possible scenario which can cause the the NPE you are seeing. You'll need to look at your logging to see whether it is happening like this.

The DelegatingInvocableHandler class is completely asynchronous, other than using some CocurrentHashMaps.

Two threads, T1 and T2 call invoke concurrently. They are processing two different messages, of the same type. At this point handlerReturnsMessage and cachedHandlers are empty.

T1 enters getHandlerForPayload, this.cachedHandlers.get(payloadClass) returns null, so T1 enters the if block. T1 caches the handler this.cachedHandlers.putIfAbsent(payloadClass, handler).

T2 enters getHandlerForPayload, this.cachedHandlers.get(payloadClass) returns non-null, so T2 just returns the handler.

T2 reaches return new InvocationResult(result, replyTo, this.handlerReturnsMessage.get(handler)). The Map handlerReturnsMessage is still empty, so an NPE results.

T1 executes setupReplyTo(handler). Now handlerReturnsMessage is no longer empty.

tgdavies
  • 10,307
  • 4
  • 35
  • 40
  • Ah okay, that makes sense. thank you for the explanation, I will check if it is indeed the problem. – rmrt Jul 04 '23 at 14:31
  • @tgdavies - thanks for the analysis https://github.com/spring-projects/spring-kafka/issues/2723 Please note that 2.8.x is no longer supported as OSS https://spring.io/projects/spring-kafka#support - the fix will be in 2.9.10 and 3.0.9. 2.9.x is compatible with Boot 2.7.x. – Gary Russell Jul 05 '23 at 14:40
  • @GaryRussell no worries. I guess the fix is just to move the `setupReplyTo` call before the `putIfAbsent`, but I don't really have the context... – tgdavies Jul 05 '23 at 21:50
  • 1
    Yep, already fixed it that way. – Gary Russell Jul 06 '23 at 11:18