2

Need some help to solve my high cpu utilisation issues with cassandra.

We have a 12 node Cassandra cluster with below spec.

  • 8 cores
  • 16GB HEAP/32GB RAM with G1GC

All of a sudden I have started seeing some high cpu load (which is around 18-24 on 8 core nodes)

cassandra stack trace was showing lot of runnable threads like below.

sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 MessagingService-Incoming-/10.xx.xx.xx
 MessagingService-Incoming-/10.xx.xx.xx
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.cassandra.io.util.NIODataInputStream.reBuffer(NIODataInputStream.java:66)
at org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:144)
at org.apache.cassandra.io.util.RebufferingInputStream.readPrimitiveSlowly(RebufferingInputStream.java:108)
at org.apache.cassandra.io.util.RebufferingInputStream.readInt(RebufferingInputStream.java:188)
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:179)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)

and

"epollEventLoopGroup-2-9": running
at io.netty.channel.epoll.Native.epollWait0(Native Method)
at io.netty.channel.epoll.Native.epollWait(Native.java:117)
at io.netty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:226)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)

First thread mentioned above has 35 occurrences and 24 occurrences for 2nd thread.

Can any one figure out what is wrong here ??

From the cluster side,

  • Don't have any pending compactions/tasks.
  • & GC pauses are below 100ms

Thanks

sandeep
  • 387
  • 1
  • 4
  • 12

1 Answers1

0

I also experienced all of a sudden high CPU usage by cassandra even having no reads/writes. Further investigation shew it was caused by misconfiguration of Prometheus metric collector. I described my story how I found it out here: https://dev.to/stasberkov/cassandra-high-cpu-load-can-be-caused-by-prometheus-misconfiguration-oc0

TLDR: I used jvm-tools.

Stanislav Berkov
  • 5,929
  • 2
  • 30
  • 36