1

we have a 3-node dev Cassandra cluster running 3.11.13 that we have upgraded to 4.0.7, and we’ve been basically sending DDL statements through our Java applications using spring-data-cassandra:3.4.6 which uses the DataStax Java Driver version 4.14.1, and ever since we hadn’t had faced any issues with it until the upgrade to 4.0.7

The main issue with 4.0.7 that we’re facing is the schema disagreements that we’ve been seeing due to the tables created programmatically that has been a non-issue for us since 3.11.x. Although DDL statements made through cqlsh is working as expected, it’s only through the programmatic creation that we’re seeing the schema disagreements.

We’ve tried different cluster setups, C* versions, and Ubuntu versions, but we still face the same issue:

3-node, single-rack DC (Ubuntu 18.04, 20.04, 22.04) (4.0.x, 4.1.x)

3-node, 3-rack DC (Ubuntu 18.04, 20.04, 22.04) (4.0.x, 4.1.x) — This is the setup we’ve been using since 3.11.x

We’ve also tried fiddling with the driver configurations like adjusting the timeouts and disabling debouncing, but with no luck, face the same issue.

    advanced.control-connection {
        schema-agreement {
            interval = 500 milliseconds
            timeout = 10 seconds
            warn-on-failure = true
        }
    },
    advanced.metadata {
        topology-event-debouncer {
            window = 1 milliseconds
            max-events = 1
        }
        schema {
            request-timeout = 5 seconds
            debouncer {
                window = 1 milliseconds
                max-events = 1
            }
        }
    }

We’re creating tables programmatically through the following snippets:

    @Override
    protected abstract List<String> getStartupScripts();

    @Bean
    SessionFactoryInitializer sessionFactoryInitializer(SessionFactory sessionFactory) {
        SessionFactoryInitializer initializer = new SessionFactoryInitializer();
        initializer.setSessionFactory(sessionFactory);
        final ResourceKeyspacePopulator resourceKeyspacePopulator = new ResourceKeyspacePopulator();

        getStartupScripts().forEach(script ->
        {
            resourceKeyspacePopulator.addScript(scriptOf(script));
        });
        initializer.setKeyspacePopulator(resourceKeyspacePopulator);
        return initializer;
    }

And create one like:

    @Override
    protected List<String> getStartupScripts() {
        return Arrays.asList(testTable());
    }


    private String testTable() {
        return "CREATE TABLE IF NOT EXISTS test_table ("
                + "test text, "
                + "test2 text, "
                + "createdat bigint, "
                + "PRIMARY KEY(test, test2))";
    }

But we end up in a loop until it timeouts due to the schema disagreement with the following errors:

DEBUG com.datastax.oss.driver.internal.core.metadata.SchemaAgreementChecker - [s1] Schema agreement not reached yet ([09989a2c-7348-3117-8b4a-d5cad549bc09, f4c8755d-6fec-38fe-984f-4083f4a0a0a0]), rescheduling in 500 ms
WARN org.springframework.context.support.GenericApplicationContext - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'sessionFactoryInitializer' defined in com.bitcoin.wallet.config.CassandraConfig: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.data.cassandra.core.cql.session.init.SessionFactoryInitializer]: Factory method 'sessionFactoryInitializer' threw exception; nested exception is org.springframework.data.cassandra.core.cql.session.init.ScriptStatementFailedException: Failed to execute CQL script statement #1 of Byte array resource [resource loaded from byte array]: CREATE TABLE IF NOT EXISTS test_table (test text,test2 text,createdat bigint,PRIMARY KEY(test, test2)); nested exception is com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT10S
  • Welcome to Stack Overflow! Your question is missing a few details. The general guidance is that you (a) provide a good summary of the problem that includes software/component versions, the full error message + full stack trace; (b) describe what you've tried to fix the problem, details of investigation you've done; and (c) minimal sample code that replicates the problem. I highly recommend https://stackoverflow.com/tour to familiarise yourself with the site. Cheers! – Erick Ramirez Feb 13 '23 at 03:15
  • Hi! I've updated the post to include more details above. Thanks for the tips! – Kenji Tanaka Feb 14 '23 at 09:18

1 Answers1

0

So two things come to mind when reading through this:

  1. Schema disagreements are often a symptom of some larger issue.

Does the node have its CPU pegged at 100%? Schema disagreement. Inefficient network routing? Schema disagreement. Disk IOPS maxed-out causing write back-pressure? Schema disagreement.

I'd have a look at the activity on the nodes and see if any of the above stand out.

  1. Programmatic schema changes are often problematic.

Each node needs to store the complete schema, so each schema change gets sent to all nodes, essentially making schema changes running at an asynchronous ALL level of consistency. Because of that, there's no margin for error. And programmatic schema changes are often sent from within an application much faster than Cassandra can reconcile them.

My recommendations for making any schema changes:

  • Execute during off-peak times.
  • Only run when all nodes are UN.
  • Run them using cqlsh (not from application code).
  • Verify each individual change using nodetool describecluster.
Aaron
  • 55,518
  • 11
  • 116
  • 132