1

I have a query in Cassandra

select count(pk1) from tableA where pk1=xyz

Table is :

create table tableA 
(
    pk1 uuid,
    pk2 int,
    pk3 text,
    pk4 int,
    fc1 int,
    fc2 bigint,
    ....
    fcn blob,
    primary key (pk1, pk2 , pk3 , pk4)

The query is executed often and takes up to 2s to execute.

I am wondering if there will be any performance gain if refactoring to:

select count(1) from tableA where pk = xyz
Bogdan
  • 702
  • 3
  • 6
  • 22

2 Answers2

2

Based on the documentation here, there is no difference between count(1) and count(*).

Generally speaking COUNT(1) and COUNT(*) will both return the number of rows that match the condition specified in your query

This is in line with how traditional SQL databases are implemented.

COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )

Count(1) is a condition that always evaluates to true.

Also, Count(Column_name) only returns the Non-Null values.

Since in your case because of where condition the "Non-null" is a non-factor, I don't think there will be any difference in performance in using one over the other. This answer tried confirming the same using some performance tests.

Abhishek Garg
  • 2,158
  • 1
  • 16
  • 30
0

In general COUNT is not at all recommended in Cassandra . As it’s going to scan through multiple nodes and get your answer back . And I’m not sure the count you get is really consistent.

  • Thanks for your contribution to the community! Generally speaking, you are correct about `COUNT()` with C*. I explained it in this post -- https://community.datastax.com/questions/6897/. But for the case above, it won't trigger a table scan because it's restricted to a single partition so will only scan the rows in that partition. Cheers! – Erick Ramirez Aug 01 '20 at 00:40