0

I have installed Apache Cassandra in a single node cluster. When I build a column family, the data gets partitioned due to the murmur3 partitioner based on the primary keys and the table does not preserve the order of the primary keys. So, the SStable output that I see would be sorted by positions, but the order of the primary keys would have altered.

For my requirement, I do not want the order of the primary keys to be shuffled. So, how do I change the partitioning scheme of Cassandra? I looked into the cassandra.yaml file, but there is no instruction on how to change from the default murmur3 partitioner. Would there be any impact if the default is changed?

This is the table I created:

CREATE TABLE ycsb.expt (
y_id varchar,
field0 varchar,
field1 varchar,
field2 vachar,
PRIMARY KEY (y_id, field0) WITH CLUSTERING ORDER BY (field0 ASC);

After adding data to the table, this is my output when I do "select * from expt"

     y_id   | field0     | field1       | field2
    --------+------------+--------------+------------
     user48 |   ?O3 :<5[ |       *B-0Qa |          .
     user14 |         .J |     (=~/0`"4 |         03
     user40 |       (Uu' |          +.0 |          ;
     user42 |         // |          ((* |         3O
      user8 |          , |     =Ao3[??< |   4.2(Hm6O

I want this output in the same order that I insert the data and I had inserted in sorted order(Ex: User8,User 14,User40). Despite creating the clustering key, it has shuffled the data around.

How do I ensure that the output is in sorted order for the table above?

Nitin
  • 21
  • 3

1 Answers1

0

The "partitioner" config in the cassandra.yaml is what determines what partitioner the cluster is using. You have 3 possibilities here but I suspect that you really do want Murmur3 Partitioner.

The documentation can explain further how the other choices work: https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archPartitionerAbout.html

But it doesn't sound like your problem is with the partitioner being used but rather with your data model. If your requirements dictates order from your rows you should reevaluate it so that it has a clustering key. Note that, this will be separate from the partition key which determines which partitions a row will fall in.

See the answer posted here for an explanation of the various ways you can configure a primary key: Difference between partition key, composite key and clustering key in Cassandra?

Once you have a clustering key you are happy with, you will be able to use the ORDER BY directive on those columns in your CQL queries.

Razzle Dazzle
  • 481
  • 2
  • 8
  • 20
  • I have added more detail to the question by giving the table format and the output that I am getting. Please check it out and suggest what could be done. I have used the clustering key as suggested by you. Since I did not get the sorted order here, I thought of changing the partitioner to order preserving. – Nitin Mar 12 '19 at 06:38
  • Your partition key should be what determines WHERE (in relation to the nodes/token ring) the data resides, the clustering key determines how the data is ordered within that partition. If you want to order by y_id you need to choose that as the clustering column and choose a different column to be the partition key (or create a new one). If you need to enforce order based on order inserted I would maybe suggest utilizing a counter column. https://docs.datastax.com/en/cql/3.3/cql/cql_using/useCountersConcept.html Then you would use this as the clustering key and it would preserve order. – Razzle Dazzle Mar 12 '19 at 15:28