Cassandra high cpu load issues (3.11.1)

Question

Need some help to solve my high cpu utilisation issues with cassandra.

We have a 12 node Cassandra cluster with below spec.

8 cores
16GB HEAP/32GB RAM with G1GC

All of a sudden I have started seeing some high cpu load (which is around 18-24 on 8 core nodes)

cassandra stack trace was showing lot of runnable threads like below.

sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 MessagingService-Incoming-/10.xx.xx.xx
 MessagingService-Incoming-/10.xx.xx.xx
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.cassandra.io.util.NIODataInputStream.reBuffer(NIODataInputStream.java:66)
at org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:144)
at org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108)
at org.apache.cassandra.io.util.RebufferingInputStream.readInt(RebufferingInputStream.java:188)
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:179)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)

and

"epollEventLoopGroup-2-9": running
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:117)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:226)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)

First thread mentioned above has 35 occurrences and 24 occurrences for 2nd thread.

Can any one figure out what is wrong here ??

From the cluster side,

Don't have any pending compactions/tasks.
& GC pauses are below 100ms

Thanks

Stanislav Berkov · Answer 1 · 2023-07-29T02:02:29.727

0

I also experienced all of a sudden high CPU usage by cassandra even having no reads/writes. Further investigation shew it was caused by misconfiguration of Prometheus metric collector. I described my story how I found it out here: https://dev.to/stasberkov/cassandra-high-cpu-load-can-be-caused-by-prometheus-misconfiguration-oc0

TLDR: I used jvm-tools.

edited Jul 29 '23 at 02:02

answered Jul 29 '23 at 00:44

Stanislav Berkov

5,929
2
30
36

Cassandra high cpu load issues (3.11.1)

1 Answers1