2

When I run this query the count(0) returns 21 for the set with zip='80005'.

select zip, avg(value), min(value), max(value), count(0) from values group by zip order by zip

There are really 109 rows with zip='80005'.

The following two queries both show 109 rows and they also return different values for min, max, and avg.

select avg(value), min(value), max(value), count(value) from values where zip='80005'

select zip, avg(value), min(value), max(value), count(value) from values group by zip having zip='80005'

There are no nulls for value.

Is there any reason why the first query is returning the wrong number of rows in the set for zip='80005'?

Maybe this is a bug in Postgresql.

Dean Schulze
  • 9,633
  • 24
  • 100
  • 165
  • is not that count(0) = 21 the number of rows that it is returning your query?, you dont have a where in the first query – Hector Sanchez Aug 11 '11 at 22:55
  • I don't need a where clause in the first query. I want to compute avg, min, and max for each zip code. The results from the first query for zip 80005 differ from what I find when I search for that zip 80005 alone. – Dean Schulze Aug 12 '11 at 13:41

4 Answers4

1

I think you want count(*) not count(0)...

Chris
  • 1,401
  • 4
  • 17
  • 28
  • count(*) and count(0) give the same result: 21 – Dean Schulze Aug 12 '11 at 13:39
  • 1
    @Chris: There is no difference between `COUNT(*)`, `COUNT(0)` or `COUNT(1)` in PostgreSQL. The `COUNT` function counts all non-NULL values. `COUNT(*)` doesn't evaluate the values, and works like `COUNT(0)`. However, if you use `COUNT(value)` you'll get the count of non-NULL values. `COUNT(*)` and other fixed value non-NULL counts return the number of affected rows. – jmz Aug 13 '11 at 12:38
  • These will always give the same result, so it isn't the source of the problem. – Andrew Lazarus Aug 24 '11 at 18:32
0

Count(1), Count(*) and Count([field]) all work.

There was one source I read that said there was a performance difference between Count(1) and Count(*) (allegedly, count(*) required more processing), however, there seems to be evidence at least for Postgress & TSQL that it doesn't make a difference.

Chains
  • 12,541
  • 8
  • 45
  • 62
  • What ever makes you think that? COUNT(data field) requires the query to fetch the data field, COUNT(*) or COUNT(some constant) do not. All versions require a sequential scan of the table, though, in order to access MVCC/transaction information to see if the row is visible to the current transaction. – aib Aug 11 '11 at 23:54
  • Read about count(*) versus count(1) here (scroll down to the performance tip, or just CTRL+F for count(1)): http://www.techonthenet.com/sql/count.php. Maybe the documentation is wrong, or maybe it's different for different sql dialects. I guess you could test it out to make sure. – Chains Aug 12 '11 at 02:07
  • Um, it does not name a database server nor does it cite a source. See http://www.postgresql.org/docs/current/static/functions-aggregate.html for Postgres. Also, please note the case where an unoptimized engine might check the expression `1` for NULLness, making it slower than a simple row count with COUNT(*). – aib Aug 12 '11 at 08:59
  • Oh, haha, there actually *was* such a case: http://stackoverflow.com/questions/1221559/count-vs-count1 (search for 'oracle') – aib Aug 12 '11 at 09:00
0

At first glance, this looks like a bug, hard to say without seeing your data.

You could try and narrow down the source of the problem.

If this :-

  select zip, avg(value), min(value), max(value), count(value) 
  from values group by zip having zip='80005'

returns 109, but this :-

select zip, avg(value), min(value), max(value), count(0) 
from values group by zip order by zip

returns 21 for 80005, what is returned for 80005 when you do this :-

  select zip, avg(value), min(value), max(value), count(value) 
  from values group by zip
steve godfrey
  • 1,234
  • 7
  • 14
0

Look and see if you have records with zip 80005b where b is one or more trailing blanks. Those would collate somewhere else in your grouped zip list. I believe by default PG will ignore trailing blanks when testing two strings for equality, but not in a GROUP BY. (Do you have a sanity check constraint on the zip field?)

Andrew Lazarus
  • 18,205
  • 3
  • 35
  • 53