1

Suppose I'm trying to put together a query with multiple conditions:

select * from t
where (Condition1 OR Condition2 OR Condition3)

My goal is to group and order the results such that:

Group 1 => C1 = true,  C2 = true,  C3 = true

Group 2 => C1 = true,  C2 = true,  C3 = false
       OR  C1 = true,  C2 = false, C3 = true
       OR  C1 = false, C2 = true,  C3 = true

Group 3 => C1 = true,  C2 = false, C3 = false
       OR  C1 = false, C2 = true,  C3 = false
       OR  C1 = false, C2 = false, C3 = true

Where C1 is Condition1, and so on. So, if you were to convert each condition into a 1 or a 0, 1 being true 0 being false, you'd get a binary number list:

111

110
101
011

100
010
001

Note that they aren't simply ordered greatest to least.

I'm just a little curious, because I thought that normally an SQL DBMS will not bother looking at other clauses in an OR chain if the first one turns out to be true, for instance. How would I convert, record by record, the results of each condition into a number that could be sorted against?

I want the solution to be expandable to a case with N conditions, is there an elegant solution for this? It also has to be very fast. I'm heavily invested in Postgres. Is there a Postgres-specific function that will help?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Isaac Bolinger
  • 7,328
  • 11
  • 52
  • 90
  • It's not really a solution but Solr or Elasticsearch seems like a better DB to do this. – HopAlongPolly Oct 14 '15 at 02:05
  • Since your invested in Postgres does this link help out? http://stackoverflow.com/questions/5396498/postgresql-sql-count-of-true-values – HopAlongPolly Oct 14 '15 at 02:09
  • *because I thought that normally an SQL DBMS will not bother looking at other clauses in an OR chain if the first one turns out to be true*. Nope. The order of clauses has no significance, the DB might execute them in whatever order it thinks will be fastest. There is no short-circuit OR in SQL. – Craig Ringer Oct 14 '15 at 02:09
  • @Craig, that makes a lot of sense. I guess it would be a big boon to execute the conditions hitting the indexes first wouldn't it. I've gotten used to that behavior in code over the years. – Isaac Bolinger Oct 14 '15 at 02:12
  • @Craig, Supposing the DBMS finds a true condition in an OR chain on its first try, it wouldn't execute the rest of them, would it? – Isaac Bolinger Oct 14 '15 at 02:15
  • What you are seeking will slow you down - quite a lot - because you will have to use a set of case expressions to eval each condition, do that for the whole set, then filter. Have fun trying to making it generic too. Consider the resources Google employs to rank pages. – Paul Maxwell Oct 14 '15 at 02:16
  • @IsaacBolinger Probably not, but it might. It may evaluate conditions in ways that don't map exactly to the SQL too - interleaved/overlapping, etc. You can't rely on that for, e.g. avoiding zero divide errors. – Craig Ringer Oct 14 '15 at 02:18
  • Hmm, maybe I need a new strategy. – Isaac Bolinger Oct 14 '15 at 02:19

1 Answers1

3

The discussion in the comments of how WHERE conditions are evaluated seems orthogonal to the question asked - for which there are simple solutions.

Rank query results based on how many of the where conditions are satisfied

SELECT *
FROM   tbl
WHERE (Condition1 OR Condition2 OR Condition3)
ORDER  BY ((Condition1) IS TRUE)::int
        + ((Condition2) IS TRUE)::int
        + ((Condition3) IS TRUE)::int DESC;

Or use the more verbose standard syntax for the casting like you (@Isaac) supplied yourself:

ORDER  BY CAST((Condition1) IS TRUE AS integer)
        + CAST((Condition2) IS TRUE AS integer)
        + CAST((Condition3) IS TRUE AS integer) DESC;

The underlying principle is this:
WHERE conditions are boolean expressions, only TRUE qualifies, FALSE and NULL don't.
We need a technique to count (or concatenate) TRUE while discarding NULL and FALSE (or the exact opposite). The above expression counts TRUE as 1 and NULL or FALSE as 0.

There are various ways to achieve the same result:

A CASE expression is slightly more verbose, but typically fastest:

...    
ORDER  BY (CASE WHEN Condition1 THEN 1 ELSE 0 END
         + CASE WHEN Condition2 THEN 1 ELSE 0 END
         + CASE WHEN Condition3 THEN 1 ELSE 0 END) DESC;

We could fold FALSE to NULL with <expression> OR NULL and then use concat() which ignores NULL values. In descending sort order 'tt' sorts before 't' etc.:

...    
ORDER  BY concat(
          Condition1 OR NULL
        , Condition2 OR NULL
        , Condition3 OR NULL) DESC;

Or, probably shortest for many conditions. We don't even need to add parentheses to each condition - form an array and drop NULL and FALSE with with array_remove(), we can then sort by the array directly:

...    
ORDER  BY array_remove(array_remove(ARRAY[
             Condition1
           , Condition2
           , Condition3
          ], NULL), FALSE) DESC;

We could even use another WHERE clause use in a subselect and count() (but this is typically slower due to the added overhead):

...    
ORDER  BY (SELECT count(*)
           FROM ( VALUES
                 (Condition1)
               , (Condition2)
               , (Condition3)
              ) t(i)
           WHERE i) DESC;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228