2

I am working on benchmarking the cassandra cluster and hence using cassandra-stress tool. Able to insert 1M records in one of the table with replication factor as 2, CL as quorum, threads rate as 40, on 2 nodes and running stress from 11.43.600.66.

./cassandra-stress user profile= demo.yaml n=1000000 ops(insert=1,likelyquery0=2) cl= quorum -node 11.43.600.66,11.43.600.65 -rate threads=40

**demo.yaml script:**  
columnspec:  
  - name: user_name  
    size: gaussian(20..45)  
    population: gaussian(10000..50000)  
  - name: system_name  
    size: gaussian(20..45)  
    population: gaussian(50..60)  
  - name: time  
    size: uniform(15..25)  
    population: uniform(100000..1000000)  
  - name: request_uri  
    size: gaussian(50..80)  
    population: gaussian(100..150)  

insert:  
  partitions: fixed(1)            
  select:  fixed(1)/1000        
  batchtype: UNLOGGED   

I am trying to understand the results of nodetool cfstats, cfhistograms with that of OpsCenter. The table level metrics of Write and read RequestLatencies (ms/op) from Opscenter are:
WriteRequestLatency](http://[Writerequestlatencygraphs ReadRequestLatency](http://[ReadRequestLatencygraphs
cfhistograms results to calculate write and read latency. The latencies are in micro secs
cfhistogramsstats](http://[cfhistogramsstats
cfstats results in milli secs
cfstats](http://[cfstats results

a) As per the results of cfhistograms and cfstats  
Write Latency: 0.0117ms = 11.7 micros
Read Latency:  0.0943ms = 94.3 micros
This would approximately match the results at 50% as 
Write Latency: 10micros
Read Latency: 103micros  

Question1: Based on what percentile does cfstats and cfhistograms show the results? I would always consider 95% but for 95% the cfstats results doesn't match with cfhistograms here. Am I considering anything wrong?

b) From OpsCenter results:
Write Latency: 1.6ms/op = 1600 micros
Read Latency:  1.9ms/op = 1900 micros 

Question2: Why is the mismatch with the results of cfhistograms and opscenter? Is it like opscenter y-axis values of write,readrequest Latency has to be in micros/op instead of ms/op?

Versions:
Cassandra 2.1.8.689
OpsCenter 5.2.2

Please let me know if I am wrong ..!!
Thanks

Arun
  • 1,692
  • 15
  • 24

1 Answers1

2

These are two different kind of metrics that are tracked statistically different.

To start with, the cluster read/write latencies are the coordinator view including possible cross node communication. From opscenter if you hover over the metric for the definition:

The average response times (in milliseconds) of a client write. The time period starts when a node receives a client write request, and ends when the node responds back to the client. Depending on consistency level and replication factor, this may include the network latency from writing to the replicas.

While in cfhistograms, you are looking at the local latencies to that node, this is also kept in OpsCenter under the CF: or TBL: metrics (depending on version). There is the percentile graph that actually will show this

The min, median, max, 90th, and 99th percentile of the response time to read data from the memtable and sstables for a specific table. The elapsed time from when the replica receives the request from a coordinator and sends a response.

histogram based latencies

So from a perspective of what the two metrics describe, its different levels of the read/write.

Additionally - the statistic used to measure them are different.

The average latency will take the total amount of time of coordinator writes since last checked divided by the number of coordinator writes since last check (see https://github.com/apache/cassandra/blob/94ff639429a65acb5f122ed559e98dd60a40e42d/src/java/org/apache/cassandra/metrics/LatencyMetrics.java#L125). This is can be far off of what is expected since there can be a lot of sub ms requests and a single 30 second one that would average out to 1ms yet.

The "better" metrics still have some statistical loss but is much better at describing the distribution of the latencies. These (the percentile ones in opscenter of cfhistograms) get updated by representing the latencies in buckets each representing a time range. This histogram gets updated during a request. In OpsCenter it keeps track of the state of the histogram every minute, and from the difference can determine how many requests occurred in each period of time. This also allows more statistically accurate merging of data across nodes in the cluster view. If one node has 1000 requests and another has 1, averaging will give the half way. By adding the totals of each nodes buckets it can represent the actual latency distribution better. There is still loss here but its minor comparatively. Each bucket represents a range and we don't know where in that range each of the requests in that bucket occurred but its small enough to be "good enough" and represent the order of magnitude well enough.

Nodetool cfhistograms had a few versions to be wary of. It used a https://en.wikipedia.org/wiki/Reservoir_sampling reservoir sampling algorithm (vitters r) instead of a histogram which is based around the idea that a normal distribution can be represented with a smaller sample of the data. Unfortunately latencies are a very heavy tailed non normal distribution and it can easily be orders of magnitude off. https://issues.apache.org/jira/browse/CASSANDRA-8662

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38