SQL count(*) giving wrong answer when used with group by

Question

When I run this query the count(0) returns 21 for the set with zip='80005'.

select zip, avg(value), min(value), max(value), count(0) from values group by zip order by zip

There are really 109 rows with zip='80005'.

The following two queries both show 109 rows and they also return different values for min, max, and avg.

select avg(value), min(value), max(value), count(value) from values where zip='80005'

select zip, avg(value), min(value), max(value), count(value) from values group by zip having zip='80005'

There are no nulls for value.

Is there any reason why the first query is returning the wrong number of rows in the set for zip='80005'?

Maybe this is a bug in Postgresql.

is not that count(0) = 21 the number of rows that it is returning your query?, you dont have a where in the first query — Hector Sanchez, Aug 11 '11 at 22:55
I don't need a where clause in the first query. I want to compute avg, min, and max for each zip code. The results from the first query for zip 80005 differ from what I find when I search for that zip 80005 alone. — Dean Schulze, Aug 12 '11 at 13:41

score 1 · Answer 1 · answered Aug 11 '11 at 22:49

1

I think you want count(*) not count(0)...

answered Aug 11 '11 at 22:49

Chris

1,401
4
17
28

count(*) and count(0) give the same result: 21 – Dean Schulze Aug 12 '11 at 13:39
1

@Chris: There is no difference between `COUNT(*)`, `COUNT(0)` or `COUNT(1)` in PostgreSQL. The `COUNT` function counts all non-NULL values. `COUNT(*)` doesn't evaluate the values, and works like `COUNT(0)`. However, if you use `COUNT(value)` you'll get the count of non-NULL values. `COUNT(*)` and other fixed value non-NULL counts return the number of affected rows. – jmz Aug 13 '11 at 12:38
These will always give the same result, so it isn't the source of the problem. – Andrew Lazarus Aug 24 '11 at 18:32

Chains · Answer 2 · 2011-08-12T15:14:53.037

0

Count(1), Count(*) and Count([field]) all work.

There was one source I read that said there was a performance difference between Count(1) and Count(*) (allegedly, count(*) required more processing), however, there seems to be evidence at least for Postgress & TSQL that it doesn't make a difference.

edited Aug 12 '11 at 15:14

answered Aug 11 '11 at 23:15

Chains

12,541
8
45
62

What ever makes you think that? COUNT(data field) requires the query to fetch the data field, COUNT(*) or COUNT(some constant) do not. All versions require a sequential scan of the table, though, in order to access MVCC/transaction information to see if the row is visible to the current transaction. – aib Aug 11 '11 at 23:54
Read about count(*) versus count(1) here (scroll down to the performance tip, or just CTRL+F for count(1)): http://www.techonthenet.com/sql/count.php. Maybe the documentation is wrong, or maybe it's different for different sql dialects. I guess you could test it out to make sure. – Chains Aug 12 '11 at 02:07
Um, it does not name a database server nor does it cite a source. See http://www.postgresql.org/docs/current/static/functions-aggregate.html for Postgres. Also, please note the case where an unoptimized engine might check the expression `1` for NULLness, making it slower than a simple row count with COUNT(*). – aib Aug 12 '11 at 08:59
Oh, haha, there actually *was* such a case: http://stackoverflow.com/questions/1221559/count-vs-count1 (search for 'oracle') – aib Aug 12 '11 at 09:00

score 0 · Answer 3 · answered Aug 13 '11 at 09:28

At first glance, this looks like a bug, hard to say without seeing your data.

You could try and narrow down the source of the problem.

If this :-

  select zip, avg(value), min(value), max(value), count(value) 
  from values group by zip having zip='80005'

returns 109, but this :-

select zip, avg(value), min(value), max(value), count(0) 
from values group by zip order by zip

returns 21 for 80005, what is returned for 80005 when you do this :-

  select zip, avg(value), min(value), max(value), count(value) 
  from values group by zip

score 0 · Answer 4 · answered Aug 24 '11 at 18:31

Look and see if you have records with zip 80005b where b is one or more trailing blanks. Those would collate somewhere else in your grouped zip list. I believe by default PG will ignore trailing blanks when testing two strings for equality, but not in a GROUP BY. (Do you have a sanity check constraint on the zip field?)

SQL count(*) giving wrong answer when used with group by

4 Answers4