2

As per the study to scale the Native Transport Request, I have configured the following parameters and the application seems to scale good, But I don't understand the following,

1) I have configured native_transport_max_threads: 256 As per my understanding it provides the concurrent request handling, so It should be equal to the total number of cores right, Why is it default to 128?

2) I set this value as -Dcassandra.max_queued_native_transport_requests=5192 What is the problem in increasing these values?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Harry
  • 3,072
  • 6
  • 43
  • 100

1 Answers1

3

1) Its the maximum concurrent requests it can coordinate, not necessarily the limit to the number of requests it do. This coordination includes things like waiting for the replicas (determined by consistency level) to return the data. This is not active work so theres no reason to limit by number of cores.

2) Back pressure to your application pushing more than your coordinator is configured to handle at once is being applied in memory of your coordinator. The cost here is heap pressure and memory available to the system along with the time sitting in queue added to your latency.

Per your other question, I think you may be getting too focused on the NTR stage when the problem is likely in your data model/queries. If increasing that queue didn't help its probably not the cause. Typically the only scenario when the queued NTRs is the issue is when you slam a LOT of tiny queries at once (usually more than a single client can make as theres a 1024 default limit per host by default). Thats pretty much the only scenario that increasing the queue limit to smooth out the spikes helps. If it doesn't help then use proxyhistograms/tablehistograms/tablestats to narrow down the table and query causing the pressure. If its not obvious it may be a GC related issue or both.

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38
  • "I think you may be getting too focused on the NTR stage when the problem is likely in your data model/queries." My thoughts exactly. – Aaron May 21 '18 at 15:48
  • @Chris Lohfink my GC is less than 1.2 seconds so its proper. Is that setting this parameter to -Dcassandra.max_queued_native_transport_requests=5192 makes sense? As per your comment single node can handle only 1024 limit per connection – Harry May 21 '18 at 16:22
  • @Chris Lohfink also I verified the data model and queries all seems Good as per datastax best data modelling – Harry May 21 '18 at 16:24
  • 1
    Technically the limit is 32k (or 64k?) in flight requests per connection (seq_id in protocol) but driver limits it by default to 1k as at that point better to just distribute over other connections. There are gaps in the best practice rules as it cannot conceivably test for all possible issues, just some of the more common ones (I actually wrote some of those:)). GCs good to rule out, if you ruled out NTR as issue by increasing the queue size. Then time to start looking at other things. 5192 is fine, if that high increasing it more will not help - its a symptom not a cause. – Chris Lohfink May 21 '18 at 17:26
  • @ChrisLohfink Thanks for the explanation, I am trying to understand one final point, For one write request in Single node, what are the NTR operations performed? I try to understand why setting 256 and 5192 is scaling ? – Harry May 22 '18 at 06:11
  • Theres a thread pool to handle the requests, and a queue to hold requests when all the threads are busy. When the queue is full it blocks, `max_queued_native_transport_requests` is how many tasks it lets queue up. Letting more queue up can help in situations where theres spikes in load because it can still complete them within timeout. If your requests are too slow because of another thing, your not helping things just letting them queue up more. – Chris Lohfink May 22 '18 at 13:34