In comments I mentioned more or less stuff related to question but I would like to make a remark.
Personally when I was in similar situation with cassandra I abused the properties it has and this is sort of a hack but I figured it might be "useful" in this context.
Basically I created a single side table where I was putting all the unique stuff. i.e.
CREATE TABLE stats_unique (
stat_group text,
user_id text,
PRIMARY KEY (stat_group, user_id)
);
Writes are usually cheap and I had no trouble with additional simple
write, after all cassandra was built for this. So every time I inserted
to base table I also inserted into the stats_unique
table. For your example it would be something like:
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '1');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '2');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '1');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '3');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '1');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '2');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '2');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '3');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '4');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '1');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '2');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '1');
INSERT INTO stats_unique (stat_group, user_id) VALUES ('users', '3');
And then when I needed the uniques I just issued a simple req like:
SELECT COUNT(1) FROM stats_unique WHERE stat_group = 'users';
count
-------
4
(1 rows)
This is not a standard solution by no means, but it was something
that worked in my particular case. Take into account that I couldn't
hold more than couple of millions of stuff in this single partition
but the system simply didn't have to support that much entity instances
so for my use case it was good enough. Also with this hack you might run into problems like timeouts for counting etc.
It would be best to have something on the side to do this count, either separate process, script or even as Ashraful Islam menioned it in his comment a spark process that would do the count for you and put it to some other table in cassandra or other storage technology.
What I used might be cassandra anti pattern (hot row etc.) but it worked for me.