3

With the following table definition:

CREATE TABLE device_by_create_date (
    year int,
    comm_nr text,
    created_at timestamp,
    PRIMARY KEY ((year, comm_nr), created_at)
) WITH CLUSTERING ORDER BY (created_at DESC)

Comm_nr is a unique identifier.

I would expect to see data ordered by created_at column, which is not the case when I add data.

Example entries:

Table CQL:

How can I issue select * from table; queries, which return data ordered by the created_at row?

double-beep
  • 5,031
  • 17
  • 33
  • 41
Alex Tbk
  • 2,042
  • 2
  • 20
  • 38

1 Answers1

4

TLDR: You need to create a new table.

Your partition key is (year, comm_nr). You're created_at key is ordered but it is ordered WITHIN that partition key. A query where SELECT * FROM table WHERE year=x AND comm_nr=y; will be ordered by created_at.

Additionally if instead of (year, comm_nr), created_at your key was instead year, comm_r, created_at even if your create table syntax only specifiied created_at as the having a clustering order, it would be created as WITH CLUSTERING ORDER BY (comm_nr DESC, created_at DESC). Data is sorted within SSTables by key from left to right.

The way to do this in true nosql fashion is to create a separate table where your key is instead year, created_at, comm_nr. You would write to both on user creation, but if you needed the answer for who created their account first you would instead query the new table.

Highstead
  • 2,291
  • 3
  • 26
  • 30
  • Thank you! Lets say I want to display all devices from 2017 and 2018. In this case I would issue 2 queries - what about paging the result? PagingState is only available per ResultSet. Custom implentation? – Alex Tbk Feb 28 '18 at 18:04
  • You should be able to get away with `SELECT * FROM table WHERE year IN (2017, 2018) ...` This will be "1 query" from the application side and should be ordered in that fashion. Understand though that the 1 query will hit `*` nodes when it tries to achieve consistency. So your cluster will be doing more work than when on a regular query. – Highstead Feb 28 '18 at 18:13
  • 2
    Thats right, but wasnt the usage of IN discouraged? – Alex Tbk Feb 28 '18 at 18:15
  • Alternatively, you could instead do the coordination(custom implementation) of that sorting in the application layer as TYPICALLY the application layer will scale better than the Data layer. `IN` is discouraged because of that scaling issue above. You're putting more work on the coordinator node (memory pressure and network) and you're hitting more of the cluster with one query. – Highstead Feb 28 '18 at 18:16