1

I have the following table. Is it possible to get count of followers for each user with a single CQL select?

create table user_follows ( name text, follows_name text, primary key (name,follows_name) );

name    | follows_name
---------+--------------
 indrani |      aravind
 indrani |        jorge
 indrani |      lalitha
 indrani |        vijay
   vijay |      aravind
   vijay |        david
   vijay |         mark
  filmon |        david
  filmon |        jorge
  filmon |      kishore
  filmon |      lalitha
  filmon |         mark
  filmon |        vijay
   david |      aravind
   david |         mark

I have the following query returning count for a single user

select count(follows_name) from user_follows where name='indrani';
Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327

2 Answers2

1

I'm afraid this is not supported. The only thing you can do is to issue your COUNT(*) query for every partition.

If you don't know your partitions you could use PER PARTITION LIMIT 1 to retrieve the first record (and hence the partition key) for each partition (data inside partition is ordered by your cluster key):

SELECT name FROM user_follows PER PARTITION LIMIT 1;

This requires Cassandra 3.6 and later.

Thinking more about it, however, a single COUNT query (with a WHERE restricting by partition) will produce a scan of the partition, because that's how C* to count the records. If you issue this query for all of your partitions then you'll scan all your dataset, and probably will kill your cluster.

If you don't need to be precise you could create a counter table with the same partition key and increment/decrement the counter for each user:

CREATE TABLE user_follows_counts (
    name text PRIMARY KEY, 
    followers counter,
);

Now you'd get what you want without scanning your dataset, with a much more efficient query:

SELECT * FROM user_follows_counts;

Check the documentation on counters for further information.

xmas79
  • 5,060
  • 2
  • 14
  • 35
0

This is possible only if you use Custom UDF in cassandra.

Ideal query for this in rdbms world would be

Select name, COUNT(*) FROM table_name GROUP BY name;

But since this is not supported in cassandra directly, you can write custom UDF to Group using Map instead.

Refer to https://stackoverflow.com/a/33223749/2990458

Community
  • 1
  • 1
Abhishek Anand
  • 1,940
  • 14
  • 27
  • This will scan all the dataset. Instead, a dedicated table is the best solution, and by a good margin. – xmas79 Nov 11 '16 at 09:16
  • @xmas79 you are right. But I guess it depends or read vs write frequency of the query. If this query is run sheldom, and updates/delete are frequent on user_follows table, counter might be overkill. Other way around your solution will work fine. – Abhishek Anand Nov 11 '16 at 17:13