2

I got some points about Native Transport Requests in Cassandra using this link : What are native transport requests in Cassandra?

As per my understanding, any query I execute in Cassandra is an Native Transport Requests.

I frequently get Request Timed Out error in Cassandra and I observed the following logs in Cassandra debug log and as well as using nodetool tpstats

/var/log/cassandra# nodetool tpstats
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
MutationStage                     0         0      186933949         0                 0
ViewMutationStage                 0         0              0         0                 0
ReadStage                         0         0      781880580         0                 0
RequestResponseStage              0         0        5783147         0                 0
ReadRepairStage                   0         0              0         0                 0
CounterMutationStage              0         0       14430168         0                 0
MiscStage                         0         0              0         0                 0
CompactionExecutor                0         0         366708         0                 0
MemtableReclaimMemory             0         0            788         0                 0
PendingRangeCalculator            0         0              1         0                 0
GossipStage                       0         0              0         0                 0
SecondaryIndexManagement          0         0              0         0                 0
HintsDispatcher                   0         0              0         0                 0
MigrationStage                    0         0              0         0                 0
MemtablePostFlush                 0         0            799         0                 0
ValidationExecutor                0         0              0         0                 0
Sampler                           0         0              0         0                 0
MemtableFlushWriter               0         0            788         0                 0
InternalResponseStage             0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
CacheCleanupExecutor              0         0              0         0                 0
Native-Transport-Requests         0         0      477629331         0           1063468

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     0
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

1) What is the All time blocked state?
2) What is this value : 1063468 denotes? How harmful it is?
3) How to tune this?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Harry
  • 3,072
  • 6
  • 43
  • 100

1 Answers1

3

Each request is taken processed by the NTR stage before being handed off to read/mutation stage but it still blocks while waiting for completion. To prevent being overloaded the stage starts to block tasks being added to its queue to apply back pressure to client. Every time a request is blocked the all time blocked counter is incremented. So 1063468 requests have at one time been blocked for some period of time due to having to many requests backed up.

In situations where the app has spikes of queries this blocking is unnecessary and can cause issues so you can increase this queue limit with something like -Dcassandra.max_queued_native_transport_requests=4096 (default 128). You can also throttle requests on client side but id try increasing queue size first.

There also may be some request thats exceptionally slow that is clogging up your system. If you have monitoring setup, look at high percentile read/write coordinator latencies. You can also use nodetool proxyhistograms. There may be something in your data model or queries that is causing issues.

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38
  • Thanks for responding. I have few queries still, 1) Why is this value (All time blocked) not decreasing? 2) Do I need to set this parameter -Dcassandra.max_queued_native_transport_requests=4096 in jvm options? 3) Am using Apache Cassandra, Could you point me to the tool monitoring set up available for Apache Cassandra? – Harry May 20 '18 at 15:35
  • Also Do you think this could cause the RequestTime Out? – Harry May 20 '18 at 15:38
  • What is the optimal value for this : cassandra.max_queued_native_transport_requests – Harry May 20 '18 at 15:46
  • 1
    All time blocked is a measure of how many tasks have been blocked since the last time Cassandra started, it wouldn't make sense to decrease. You dont need to but it may help. It may be part of cause of a request timeout, but a bad query/table could also be cause of it being blocked. Look at proxy histograms and tablehistograms to find out table with slow queries and debug from there. Optimal value: It depends, if there was a single optimal value it would be set to it. It varies on use case and data model. – Chris Lohfink May 21 '18 at 13:23