2

The Problem

Simple CQL select failing when I have a large data load.

Setup

I am using the following Cassandra schema:

CREATE KEYSPACE fv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };


create table entity_by_identifier (
    identifier text,
    state entity_state,
    PRIMARY KEY(identifier)
);

CREATE TYPE entity_state,(
    identifier text,
    number1 int,
    number2 double,
    entity_type text,
    string1 text,
    string2 text
);

The query I am trying to execute:

SELECT * FROM fv.entity_by_identifier WHERE identifier=:identifier;

The Issue

This query works fine in a small dataset (tried with 500 rows). However, with a large data load test, I am creating over 5million rows in this table before proceeding to execute this query multiple times (10 threads continuously performing this query for 1 hour).

Once the data load has completed, the queries begin but immediately fail with the following error:

com.datastax.driver.core.exceptions.ReadFailureException: Cassandra failure during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded, 1 failed)
    at com.datastax.driver.core.exceptions.ReadFailureException.copy(ReadFailureException.java:85)
    at com.datastax.driver.core.exceptions.ReadFailureException.copy(ReadFailureException.java:27)
    at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
    at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)
...my calling classes...

I've checked the Cassandra log and found only this exception:

java.lang.AssertionError: null
    at org.apache.cassandra.db.rows.BTreeRow.getCell(BTreeRow.java:212) ~[apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.db.SinglePartitionReadCommand.canRemoveRow(SinglePartitionReadCommand.java:899) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.db.SinglePartitionReadCommand.reduceFilter(SinglePartitionReadCommand.java:863) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndSSTablesInTimestampOrder(SinglePartitionReadCommand.java:748) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:519) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:496) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(SinglePartitionReadCommand.java:358) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:366) ~[apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1797) ~[apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2466) ~[apache-cassandra-3.7.jar:3.7]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101]
    at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.concurrent.SEPExecutor.maybeExecuteImmediately(SEPExecutor.java:192) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.AbstractReadExecutor.makeRequests(AbstractReadExecutor.java:117) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.AbstractReadExecutor.makeDataRequests(AbstractReadExecutor.java:85) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.AbstractReadExecutor$NeverSpeculatingReadExecutor.executeAsync(AbstractReadExecutor.java:214) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.doInitialQueries(StorageProxy.java:1702) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1657) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1604) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1523) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.db.SinglePartitionReadCommand.execute(SinglePartitionReadCommand.java:335) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:67) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.service.pager.SinglePartitionPager.fetchPage(SinglePartitionPager.java:34) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.cql3.statements.SelectStatement$Pager$NormalPager.fetchPage(SelectStatement.java:325) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:361) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:237) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:78) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:208) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:486) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:463) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:130) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:507) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:401) [apache-cassandra-3.7.jar:3.7]
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.36.Final.jar:4.0.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:292) [netty-all-4.0.36.Final.jar:4.0.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:32) [netty-all-4.0.36.Final.jar:4.0.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:283) [netty-all-4.0.36.Final.jar:4.0.36.Final]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101]
    at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) [apache-cassandra-3.7.jar:3.7]
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.7.jar:3.7]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]

As you can see I am using Cassandra 3.7. The Datastax driver in use is version 3.1.0.

Any ideas why the larger data set could cause this error?

Daniel
  • 21
  • 4
  • 1
    What is entity_state ? Is it User Defined Data Type ? You are getting Error on Because of ColumnDefinition c is null ```java public Cell getCell(ColumnDefinition c) { assert !c.isComplex(); return (Cell) BTree.find(btree, ColumnDefinition.asymmetricColumnDataComparator, c); } ``` – Ashraful Islam Oct 08 '16 at 19:16
  • 1
    Yes, "entity_state" is a User Defined Data Type. I've seen that part of the Cassandra code but have no idea why the Column Definition would be null. – Daniel Oct 10 '16 at 09:04

2 Answers2

0

For the amount of records that you want to retrieve, it will be worthy to use pagination, to retrieve smaller chunks.

Edit

As explained here, you may be encountering a timeout of the read; going through the millions of records that you refer may be taking longer than the read_request_timeout_in_ms threshold (default is 5 seconds). One option is to increase that threshold.

Community
  • 1
  • 1
Carlos Monroy Nieblas
  • 2,225
  • 2
  • 16
  • 27
  • I'm only trying to retrieve 1 of the records. Pagination won't make much difference in this situation, surely? – Daniel Oct 07 '16 at 09:02
  • But the query that you post is trying to retrieve all the records, there is no LIMIT statement. – Carlos Monroy Nieblas Oct 07 '16 at 14:17
  • Even though I'm using a WHERE clause, which selects by a single identifier (which is the primary key)? "WHERE identifier=:identifier;" :identifier is parameter I set to a single identifier, which should reference a single row. – Daniel Oct 07 '16 at 14:52
  • Even though Daniel is loading 5 million rows, I think he just wants to return 1 row, and so is using a WHERE clause with the PRIMARY KEY. – cs94njw Oct 07 '16 at 14:58
  • "going through the millions of records that you refer may be taking longer than the read_request_timeout_in_ms threshold" "identifier" is part of the primary key - so shouldn't it be a simple lookup? – cs94njw Oct 10 '16 at 10:14
0

Found a solution to the problem.

When using a User Defined Type the "frozen" keyword needs to be used.

create table entity_by_identifier (
identifier text,
state entity_state,
PRIMARY KEY(identifier)
);

becomes:

create table entity_by_identifier (
identifier text,
state frozen <entity_state>,
PRIMARY KEY(identifier)
);

You can find information about this Frozen keyword at: http://docs.datastax.com/en/cql/3.1/cql/cql_using/cqlUseUDT.html and http://docs.datastax.com/en/cql/3.1/cql/cql_reference/create_table_r.html#reference_ds_v3f_vfk_xj__tuple-udt-columns

Although, it is still not clear to me why the lack of this "Frozen" keyword led to the error I was seeing.

Daniel
  • 21
  • 4