2

In the table my_obj there are two integer fields:

(value_a integer, value_b integer);

I try to compute how many time value_a = value_b, and I want to express this ratio in percents. This is the code I have tried:

select sum(case when o.value_a = o.value_b then 1 else 0 end) as nb_ok,
       sum(case when o.value_a != o.value_b then 1 else 0 end) as nb_not_ok,
       compute_percent(nb_ok,nb_not_ok)
from  my_obj as o
group by o.property_name;

compute_percent is a stored_procedure that simply does (a * 100) / (a + b)

But PostgreSQL complains that the column nb_ok doesn't exist.
How would you do that properly ?

I use PostgreSQL 9.1 with Ubuntu 12.04.

Stephane Rolland
  • 38,876
  • 35
  • 121
  • 169
  • `But postgresql complains that the column nook doesn't exist.` ? Please fix your question, there is no `nook` anywhere. When talking about an error message, put it in your question **as is**. Copy / Paste. – Erwin Brandstetter Mar 27 '13 at 01:50
  • Consider the last paragraph of my answer and follow the link to the manual to understand why `nbok` appears in lower case. – Erwin Brandstetter Mar 27 '13 at 14:08
  • 1
    Upper-casing of SQL keywords is totally optional and a matter of taste. But lower casing of identifiers is not. – Erwin Brandstetter Mar 27 '13 at 14:13

3 Answers3

3

There is more to this question than it may seem.

Simple version

This is much faster and simpler:

SELECT property_name
      ,(count(value_a = value_b OR NULL) * 100) / count(*) AS pct
FROM   my_obj
GROUP  BY 1;

Result:

property_name | pct
--------------+----
 prop_1       | 17
 prop_2       | 43

How?

  • You don't need a function for this at all.

  • Instead of counting value_b (which you don't need to begin with) and calculating the total, use count(*) for the total. Faster, simpler.

  • This assumes you don't have NULL values. I.e. both columns are defined NOT NULL. The information is missing in your question.
    If not, your original query is probably not doing what you think it does. If any of the values is NULL, your version does not count that row at all. You could even provoke a division-by-zero exception this way.
    This version works with NULL, too. count(*) produces the count of all rows, regardless of values.

  • Here's how the count works:

     TRUE  OR NULL = TRUE
     FALSE OR NULL = NULL
    

    count() ignores NULL values. Voilá.

  • Operator precedence governs that = binds before OR. You could add parentheses to make it clearer:

    count ((value_a = value_b) OR FALSE)
    
  • You can do the same with

    count NULLIF(<expression>, FALSE)
    
  • The result type of count() is bigint by default.
    A division bigint / bigint, truncates fractional digits.

Include fractional digits

Use 100.0 (with fractional digit) to force the calculation to be numeric and thereby preserve fractional digits.
You may want to use round() with this:

SELECT property_name
      ,round((count(value_a = value_b OR NULL) * 100.0) / count(*), 2) AS pct
FROM   my_obj
GROUP  BY 1;

Result:

property_name | pct
--------------+-------
 prop_1       | 17.23
 prop_2       | 43.09

As an aside:
I use value_a instead of valueA. Don't use unquoted mixed-case identifiers in PostgreSQL. I have seen too many desperate question coming from this folly. If you wonder what I am talking about, read the chapter Identifiers and Key Words in the manual.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Does this `COUNT` run faster than the similar SUM((value_a=value_b)::integer) or the analogous `SUM(CASE...)`? – Andrew Lazarus Mar 29 '13 at 18:27
  • @AndrewLazarus: The difference is generally marginal, as each of the three forms is very cheap compared to reading data from disk. In my last test `OR` came before `CASE` and `::int`. But `OR`, `CASE` an `NULLIF` were "to close to call", really. You can test this easily. Here is a [recent similar test on dba.SE](http://dba.stackexchange.com/a/27572/3684) (but without an `OR` version). – Erwin Brandstetter Mar 29 '13 at 20:10
2

Probably the easiest way to do is to just use a with clause

WITH data 
     AS (SELECT Sum(CASE WHEN o.valuea = o.valueb THEN 1 ELSE 0 END) AS nbOk, 
                Sum(CASE WHEN o.valuea != o.valueb THEN 1 ELSE 0 END) AS nbNotOk, 
         FROM   my_obj AS o 
         GROUP  BY o.property_name) 
SELECT nbok, 
       nbnotok, 
       Compute_percent(nbok, nbnotok) 
FROM   data
Stephane Rolland
  • 38,876
  • 35
  • 121
  • 169
Conrad Frix
  • 51,984
  • 12
  • 96
  • 155
  • Supposing a covering index on one/both columns, will the optimizer be able to use them for this query (and in what capacity), or will it be a strict table scan? – Clockwork-Muse Mar 26 '13 at 18:39
  • @Clockwork-Muse it works just like an in-line query. It will use whatever it can to build the `data` but after that everything is done in memory. – Conrad Frix Mar 26 '13 at 18:49
  • @ConradFrix I made a mistake in my question, i'm going to edit, the `group by` is `group by o.property_name`. It doesn't make sense otherwise. – Stephane Rolland Mar 26 '13 at 18:59
1

You might also want to try this version:

WITH all(count) as (SELECT COUNT(*)
                    FROM my_obj),
     matching(count) as (SELECT COUNT(*)
                         FROM my_obj
                         WHERE valueA = valueB)
SELECT nbOk, nbNotOk, Compute_percent(nbOk, nbNotOk)
FROM (SELECT matching.count as nbOk, all.count - matching.count as nbNotOk
      FROM all
      CROSS JOIN matching) data
Clockwork-Muse
  • 12,806
  • 6
  • 31
  • 45