0

We're experiencing issues with constinuously running java applications that update counters in Cassandra. From monitoring the load of the servers we don't see any correlations with the load. The queries are quite constant, because they update values in only 8 different tables. Every minute the java applications fires thousands of queries (can be 20k or even 50k queries), but every once in a while some of those fail. When that happens we write them to a file, along with the exception message. This message is always Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)

We did some googling and troubleshooting and took several actions:

  • Changed the retry policy in the java applications to DefaultRetryPolicy instead of the FallthroughRetryPolicy, to have the client retry a query on failure.
  • Changed the write_request_timeout_in_ms setting on the Cassandra nodes from the standard value of 2000 to 4000 and then to 10000.

These actions diminished the number of failing queries, but they still occur. From the millions of queries that are executed on an hourly basis, we see about 2000 failing queries over a period of 24 hours. All have the same exception listed above, and they occur at varying times.

Of course we see from the logs that when queries do fail, it takes a while, because it's waiting for a time out and performs retries.

Some facts:

  • We run Cassandra v2.2.5 (recently upgraded from v2.2.4)
  • We have a geo aware Cassandra cluster with 6 nodes: 3 in Europe, 3 in US.
  • The java applications that fire queries are the only clients that communicate with Cassandra (for now).
  • The number of java applications is 10: 5 in EU, 5 in US.
  • We execute all queries asynchronously (session.executeAsync(statement);) and keep track of which individual queries by adding callbacks for success and failure.
  • The replication factor is 2.
  • The replication factor is 2.
  • We run Oracle Java 1.7.0_76 Java(TM) SE Runtime Environment (build 1.7.0_76-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)
  • The 6 Cassandra nodes run on bare metal with the following specs:
    • Storage is a group of SSDs in raid 5.
    • Each node has 2x (6 core) Intel Xeon E5-2620 CPU's @ 2.00GHz (totalling the number of hardware threads to 24).
    • The RAM size is 128GB.

How we create the cluster:

private Cluster createCluster() {
    return Cluster.builder()
            .addContactPoints(contactPoints)
            .withRetryPolicy(DefaultRetryPolicy.INSTANCE)
            .withLoadBalancingPolicy(getLoadBalancingPolicy())
            .withReconnectionPolicy(new ConstantReconnectionPolicy(reconnectInterval))
            .build();
}
private LoadBalancingPolicy getLoadBalancingPolicy() {
    return DCAwareRoundRobinPolicy.builder()
            .withUsedHostsPerRemoteDc(allowedRemoteDcHosts) // == 3 
            .build();
}

How we create the keyspace:

CREATE KEYSPACE IF NOT EXISTS traffic WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'AMS1': 2, 'WDC1': 2};

Example table (they all look similar)

CREATE TABLE IF NOT EXISTS traffic.per_node (
    node text,
    request_time timestamp,
    bytes counter,
    ssl_bytes counter,
    hits counter,
    ssl_hits counter,
    PRIMARY KEY (edge, request_time)
) WITH CLUSTERING ORDER BY (request_time DESC)
    AND compaction = {'class': 'DateTieredCompactionStrategy'};
Balthasar
  • 63
  • 1
  • 8

1 Answers1

2

Many remarks:

  1. first for the Cluster config, you should specify the local DC name
  2. you should use LOCAL_ONE instead of ONE for consistency level to enhance data locality
  3. DO NOT change the write_request_timeout_in_ms value. You're just sweeping issues under the carpet, your real issue is not the timeout setting
  4. What is your Replication Factor ?
  5. Every minute the java applications fires thousands of queries (can be 20k or even 50k queries)--> simple maths give me ~ 300 inserts/sec per node with the assumption that RF=1. It is not that huge but your inserts may be limited by hardware. What is your CPU config (number of cores) and disk type (spinning disk or SSD) ?
  6. Do you throttle the async inserts ? E.g. fire those in batch of N inserts and wait a little bit for the cluster to breath. See my answer here for throttling: What is the best way to get backpressure for Cassandra Writes?
Community
  • 1
  • 1
doanduyhai
  • 8,712
  • 27
  • 26
  • Thanks for the answer! (1) The list of servers we provide is the list of local nodes. [According to the documentation](https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.Builder.html#withLocalDc-java.lang.String-) this accomplishes the same. (2) We updated our code, thanks. (3) Agreed. (4) The replication factor is 2. Added it to the facts for clarity. (5) The Cassandra data is stored on SSDs in raid5. Updated the facts. (6) We don't throttle the inserts. Will consider this. -- We are curious to see the effect of your suggested changes! – Balthasar Feb 19 '16 at 09:18
  • Is the SSDs group in RAID5 shared by the 6 nodes or is it a config for **each** node ? – doanduyhai Feb 19 '16 at 09:41
  • Each node has its own SSDs. – Balthasar Feb 19 '16 at 09:48
  • That is weird, normally with SSD and 24 cores, each node should be able to handle 1000+ insert/sec. Did you monitor the system load with tools like **dstat** and **iostat** to see what is the bottlenecks ? Read this blog post for C* tuning: https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html – doanduyhai Feb 19 '16 at 09:55
  • Yes, we've been monitoring the nodes' load and IO using `iostat`. They're using a minor portion of their capacity, since the only thing they're running now is Cassandra. – Balthasar Feb 19 '16 at 10:30
  • More ideas: **nodetool compactionstats** to see if compaction is falling behind + **nodetool tpstats** to see if there are blocked stages – doanduyhai Feb 19 '16 at 11:18
  • I'm pre-emptively accepting your answer even tho the problem isn't fixed yet, because (1) it's been very helpful and (2) we believe that throttling the queries will solve the problem. This week we've seen that the load on Cassandra nodes shows enormous spikes. As said, the agents fire tens of thousands of queries at a time once every 60 seconds. When they do, the load increases dramatically on the Cassandra servers. By spreading the queries over those 60s we expect to solve the problem. – Balthasar Feb 26 '16 at 09:21
  • Good to know, just keep me updated when you throttle your inserts and if it reduces the spikes you have on the cluster – doanduyhai Feb 26 '16 at 13:22
  • **Hurray!! The query throttling solved our problem!** The processes have been running for 4 days straight and not a single query has failed. On average 63K queries per minute are fired, which are divided over 100 groups. The minimum group size is 250. The groups are executed with an interval of 600ms. – Balthasar Mar 08 '16 at 09:00