I have a requirement to create a table with id, creation date, time and other fields. I need to query some 'n' records based on latest creation date and time. In order to achieve this, how the table should be? Creation with id as partition key, and date and time as cluster keys work? Is it possible to do order by for all ids i.e., without any condition on partition key?
-
This link may provide some insight about the concept https://stackoverflow.com/questions/24949676/difference-between-partition-key-composite-key-and-clustering-key-in-cassandra – Ersoy May 12 '20 at 10:54
1 Answers
Creation with
id
as partition key, and date and time as cluster keys work?
Technically you could, but the performance would be abysmal. Using id
as the partition key would put all rows on different nodes, regardless of timestamp. As you want to query by a time range, you should also store your data by a time component.
Is it possible to do order by for all
ids
i.e., without any condition on partition key?
No, not at all possible. Cassandra is designed to perform sequential reads. Therefore, result set ordering in Cassandra is dependent on data being already stored in the order that you want to retrieve it.
This is how I would do it:
CREATE TABLE events (
id UUID,
month_bucket INT,
creation TIMESTAMP,
name TEXT,
PRIMARY KEY ((month_bucket),creation,id))
WITH CLUSTERING ORDER BY (creation DESC, id ASC);
After inserting some data, this query works:
aploetz@cqlsh:stackoverflow> SELECT * FROM events
WHERE month_bucket=202005
AND creation >= '2020-05-01' AND creation < '2020-05-13';
month_bucket | creation | id | name
--------------+---------------------+-------------+----------------
202005 | 2020-05-05 05:00:00 | 1d073ed5... | Cinco de Mayo
202005 | 2020-05-04 05:00:00 | 3cca75a0... | May the Fourth
202005 | 2020-05-01 05:00:00 | fd2bb58f... | May Day
(3 rows)
This utilizes a modeling technique known as "bucketing." Since I'm not sure of the number of entries that can happen for a given time period (only you know that), I guessed and used "month" as an example. Basically, you'll want to choose a time "bucket" that is both:
- big enough to be useful (provide the queried data)
- small enough to stay under Cassandra's 2GB/2 billion cells per partition requirement.
Then, provide the bucketing value with the query, along with a range component on creation
.
The CLUSTERING ORDER
definition keeps the data stored by creation
in a descending order. id
is added to the end as a way to ensure uniqueness. I've used this pattern to solve range query models on date/time components several times.

- 55,518
- 11
- 116
- 132