kafka broker became failed after some time

Question

we have hadoop cluster version 2.6.4 with ambari GUI in our cluster we have 3 kafka machines and they are standalone machines ,while 3 zookeper server installed on other machines - master01/02/03

one of the kafka machine we seen a strange problem while other kafka machines not have this problem

the problem is - when we start the kafka broker after couple min it goes down

here are the logs:

from kafka.err

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ThrottledRequestReaper-Fetch"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ExpirationReaper-1002"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-network-thread-1002-PLAINTEXT-2"
Exception in thread "ExpirationReaper-1002" Exception in thread "ExpirationReaper-1002" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ExpirationReaper-1002"
Exception in thread "metrics-meter-tick-thread-2" java.lang.OutOfMemoryError: Java heap space
Exception in thread "metrics-meter-tick-thread-3" java.lang.OutOfMemoryError: Java heap space
Exception in thread "metrics-meter-tick-thread-4" java.lang.OutOfMemoryError: Java heap space
Exception in thread "metrics-meter-tick-thread-5" java.lang.OutOfMemoryError: Java heap space

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main-SendThread(master02.sys87.com:2181)"
Exception in thread "metrics-meter-tick-thread-6" java.lang.OutOfMemoryError: Java heap space
Exception in thread "metrics-meter-tick-thread-1" java.lang.OutOfMemoryError: Java heap space
Exception in thread "metrics-meter-tick-thread-7" java.lang.OutOfMemoryError: Java heap space
Exception in thread "metrics-meter-tick-thread-9" java.lang.OutOfMemoryError: Java heap space
Exception in thread "metrics-meter-tick-thread-10" java.lang.OutOfMemoryError: Java heap space
Exception in thread "metrics-meter-tick-thread-11" java.lang.OutOfMemoryError: Java heap space
Exception in thread "metrics-meter-tick-thread-1" java.lang.OutOfMemoryError: Java heap space

from the reading of the log , looks like this is - "heap space allocated" on kafka machine

any advice what is the solution for this?

second

how we can explain that this problem is on one of the kafka machine while the two other we not have this problem ? is it logical ?

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

You have an OutOfMemoryError, which means that at some point Kafka instance needed to allocate more memory, found that either no physical memory available or it reached a limit set in JVM (note that Kafka is written in Java/Scala, so in runs in a JVM) options, called Garbage Collector to free some memory, but couldn't free enough with it.

Why it could happen? There are multiple possible reasons.

A bug in Kafka code that keep unused memory from freeing
Extensive load that current machine can't handle
Improper use or configuration. For example, you set up a stream, connected to it, but don't read. Or read too slow. Backlog grows until if fills your whole memory
Too strict memory allowances for Kafka instance. To let it take more memory run in bash export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G" (try to find a working value). More detail here: https://stackoverflow.com/a/36649296/78569
Conflict between JVM options and cgroups configuration. E.g. you set -Xmx2G but only 1G in cgoups (memory.limit_in_bytes).
Using/configuring docker (which uses lxc which uses cgroups) or other virtualization/containerization tool improperly. Or even properly - I heard there are some misunderstanding between JVM options and cgroups limits that is only fixed in beta releases of Java.

This is not a full list, but a start to explore where your problem lies.

EDIT
If you see no obvious flaw in configuration and behavior of a broker, you may analyze process dump at the time of crash to see where all the momory go. To do it, add -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=... to the JVM options. Then you can load this dump into some analyzer like HeapWalker and look for unusually big number/size of objects.

@Lmaskar , the default in file - /usr/hdp/2.6.4.0-91/kafka/bin/kafka-server-start.sh is KAFKA_HEAP_OPTS="-Xmx1G -Xms1G" , so do you suggest to increase them for example to KAFKA_HEAP_OPTS="-Xmx5G -Xms5G" ? — enodmilvado, May 11 '18 at 14:16
Yes, it is a good start to try increase the memory allowance. See how much can you let it take. But note, that java process is not only the heap, so dont set 6G if you only have 6G. If increasing memory doesn't help, then you have a problem in configuration, explore other options. The next step I'd take is too watch if some topic grows uncontrolled. — Imaskar, May 11 '18 at 14:22
Lmaskar , thank you so much , really help to understand the things better , we have on each kafka 256G this is very strong machine so we can increase it without worry , second important point , we restart yesterday the machine but this restart not help and kafka broker failed after some time , second about what you said " topic grows uncontrolled" can you suggest how to verified this ? do you mean to watch or trace the topic size under /var/kafka/kafka-logs/ ? — enodmilvado, May 11 '18 at 14:30
yes, topic sizes and some other metrics like `kafka.network:type=RequestChannel,name=RequestQueueSize` maybe a lot of requests come faster than they can be served and get queued until memory limit is reached. — Imaskar, May 11 '18 at 14:57
I updated the answer with another approach to analyze the problem. — Imaskar, May 11 '18 at 15:03

kafka broker became failed after some time

1 Answers1