COUNT(*) vs. COUNT(1) vs COUNT(column) with WHERE condition performance in Cassandra

Question

I have a query in Cassandra

select count(pk1) from tableA where pk1=xyz

Table is :

create table tableA 
(
    pk1 uuid,
    pk2 int,
    pk3 text,
    pk4 int,
    fc1 int,
    fc2 bigint,
    ....
    fcn blob,
    primary key (pk1, pk2 , pk3 , pk4)

The query is executed often and takes up to 2s to execute.

I am wondering if there will be any performance gain if refactoring to:

select count(1) from tableA where pk = xyz

Do you need an exact count? You can get estimates with nodetool. — Brad Schoening, Aug 03 '20 at 02:05

score 2 · Answer 1 · answered Jul 31 '20 at 17:20

Based on the documentation here, there is no difference between count(1) and count(*).

Generally speaking COUNT(1) and COUNT(*) will both return the number of rows that match the condition specified in your query

This is in line with how traditional SQL databases are implemented.

COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )

Count(1) is a condition that always evaluates to true.

Also, Count(Column_name) only returns the Non-Null values.

Since in your case because of where condition the "Non-null" is a non-factor, I don't think there will be any difference in performance in using one over the other. This answer tried confirming the same using some performance tests.

Excellent response. Thanks! – Erick Ramirez Aug 01 '20 at 00:35 — Erick Ramirez, Aug 01 '20 at 00:35

score 0 · Answer 2 · answered Jul 31 '20 at 17:20

0

In general COUNT is not at all recommended in Cassandra . As it’s going to scan through multiple nodes and get your answer back . And I’m not sure the count you get is really consistent.

answered Jul 31 '20 at 17:20

Nirmal Singh K P S

1

Thanks for your contribution to the community! Generally speaking, you are correct about `COUNT()` with C*. I explained it in this post -- https://community.datastax.com/questions/6897/. But for the case above, it won't trigger a table scan because it's restricted to a single partition so will only scan the rows in that partition. Cheers! – Erick Ramirez Aug 01 '20 at 00:40

COUNT(*) vs. COUNT(1) vs COUNT(column) with WHERE condition performance in Cassandra

2 Answers2