3

I have a table where I am logging user activity performed on my website. My table structure looks like:

CREATE TABLE key_space.log (
    id uuid,
    time bigint,
    ip text,
    url text,
    user_id int,
    PRIMARY KEY (id, time)
) WITH CLUSTERING ORDER BY (time DESC)

Now I want to fetch all the records which came in last 5 minutes.

For doing the same, I am using

select * from key_space.log where 
  time>current_timestamp - 5 minutes ALLOW FILTERING;

But this query is not returning any result & i am getting timedoutexception error. How to solve this problem? Any help on this would be really appreciated.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132

2 Answers2

2

As with all Cassandra models, you'll need to start by building a table specifically designed to support that query. Even if you could make it work with your current table, it would have to scan every node in the cluster, which would probably time-out (as you are seeing).

One way to do this will be to use a time "bucket" as a partition key. If you just care about records for the last 5 minutes, then "day" should work, as long as you don't get millions of new records per day. If you do, then you'll need a smaller time component for your "bucket."

CREATE TABLE log_by_day (
    id uuid,
    day bigint,
    time bigint,
    ip text,
    url text,
    user_id int,
    PRIMARY KEY (day, time, id)
) WITH CLUSTERING ORDER BY (time DESC, id ASC);

Now a query like this will work:

aaron@cqlsh:stackoverflow> SELECT day,time,id,user_id FROM log_by_day
  WHERE day=201920 AND time > 1563635871941;

 day    | time          | id                                   | user_id
--------+---------------+--------------------------------------+---------
 201920 | 1563635872259 | 7fef03da-6c23-4bf2-9e98-fd126ab17944 |    1234
 201920 | 1563635872259 | 9a0c49ce-5ad2-45c5-b570-cd9de1c060d1 |    4607
 201920 | 1563635872209 | 9227166e-cda2-4909-b8ac-4168922a0128 |    2112

(3 rows)

Pro-tip: Partitioning on a unique id is great for data distribution, but doesn't give you much in the way of query flexibility.

Aaron
  • 55,518
  • 11
  • 116
  • 132
0

Your table has id as the partition key and time as the clustering key. In order to run range query on clustering key, you need to specify the partition key also. So the query should be

 select * from key_space.log where id="xyz" and time>current_timestamp - 5 minutes ALLOW FILTERING;

This answer describes the different Cassandra keys very well.

Abhishek Garg
  • 2,158
  • 1
  • 16
  • 30