1

We are running cassandra version 2.0.9 in production. It's a 4 node cluster. For the past few days we are experiencing a high spike in CPU Utilisation. You may see in the picture below.

top

This is the jconsole output. jconsole

When we looked into the threads which are eating a lot of CPU we came across Native Transport request these are eating a lot of CPU (Like 12%) which is huge.

Thread stack trace. stack trace

Threads info. thread info

Thread CPU%. thread top

What can the problem be how should we go about debugging it?

Why are majority of NTR request stuck on BCrypt.java? Is this the problem?

The cluster was behaving normally a few days back but now out of 4 nodes 3 are always on high CPU Utilisation.

Vinoth Kumar J
  • 167
  • 1
  • 10

1 Answers1

1

You have authentication enabled which stores bcrypted hash, not the password. So each request needs to to be checked. This will end up being a CPU issue if you are continually creating new connections instead of reusing an authenticated session. Sessions are long lived objects and should be by default (https://github.com/datastax/php-driver/tree/master/features#persistent-sessions) but if using CGI or something constantly creating new processes you will still have issues. Maybe try php-fpm ?

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38
  • Thanks for that reply. We are using persistent connections. It appears you are right because all are request are actually authenticating and then processing the request. How can I stop that? when I do phpinfo() and grep for cassandra I get this Cassandra support => enabled C/C++ driver version => 2.2.2 Persistent Clusters => 0 Persistent Sessions => 0 Directive => Local Value => Master Value cassandra.log => cassandra.log => cassandra.log cassandra.log_level => ERROR => ERROR Does it mean I am not using persistant connection? We are not using php-fpm – Vinoth Kumar J Sep 01 '16 at 16:30
  • a lot will depend on what your tech stack for php server is and how its setup. The persistent connections only live as long as the process so if the requests are being handled by a new process each time it wont persist. – Chris Lohfink Sep 01 '16 at 16:33