3

In PostgreSQL, I would like to select a row based on some criteria, but if no row matches the criteria, I would like to return the first row. The table actually contains an ordinal column, so the task should be easier (the first row is the one with ordinal 0). For example:

SELECT street, zip, city
FROM address
WHERE street LIKE 'Test%' OR ord = 0
LIMIT 1;

But in this case, there is no way to guarantee the order of the records that match, and I have nothing to order them by. What would be the way to do this using a single SELECT statement?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Nikša Baldun
  • 1,854
  • 4
  • 28
  • 39
  • 1
    You just need to set up `ORDER BY street LIKE 'Test%' DESC, ord = 0 DESC`. You can remove the `WHERE` clause too, if there is always an `ord = 0` row. – pozs Jun 10 '15 at 10:43
  • @pozs: Good point - except when `street` can be NULL (which is a common case in address data). I wrote more in my answer. – Erwin Brandstetter Jun 13 '15 at 00:26

4 Answers4

5

You are on the right track. Just add an order by:

SELECT street, zip, city
FROM address
WHERE street LIKE 'Test%' OR ord = 0
ORDER BY (CASE WHEN street LIKE 'Test%' THEN 1 ELSE 0 END) DESC
LIMIT 1;

Or, alternately:

ORDER BY ord DESC

Either of these will put the ord = 0 row last.

EDIT:

Erwin brings up a good point that from the perspective of index usage, an OR in the WHERE clause is not the best approach. I would modify my answer to be:

SELECT *
FROM ((SELECT street, zip, city
       FROM address
       WHERE street LIKE 'Test%'
       LIMIT 1
      )
      UNION ALL
      (SELECT street, zip, city
       FROM address
       WHERE ord = 0
       LIMIT 1
      )
     ) t
ORDER BY (CASE WHEN street LIKE 'Test%' THEN 1 ELSE 0 END) DESC
LIMIT 1;

This allows the query to make use of two indexes (street and ord). Note that this is really only because the LIKE pattern does not start with a wildcard. If the LIKE pattern starts with a wildcard, then this form of the query would still do a full table scan.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Of course... by ordinal descending... stupid of me not to see that. :-) – Nikša Baldun Jun 10 '15 at 10:46
  • 1
    Pretty poor for *all* of us not to see that. Well played @Gordon. – amcdermott Jun 10 '15 at 10:50
  • @amcdermott: I am not so fond of this answer, which fails to point out a couple of hidden problems. I added another answer. – Erwin Brandstetter Jun 12 '15 at 23:39
  • Your answer now addresses the NULL issue. But either query still returns an arbitrary pick from multiple matches. The added 2nd query is needlessly complex and *expensive*. The outer `ORDER BY` makes it impossible for the query planner to optimize the query. Both SELECTs are always executed, even if not necessary. If you strip the noise, you'll arrive at my second query. – Erwin Brandstetter Jun 13 '15 at 00:40
3

I would like to select a row based on some criteria, but if no row matches the criteria, I would like to return the first row

Shorter (and correct)

You don't actually need a WHERE clause at all:

SELECT street, zip, city
FROM   address
ORDER  BY street !~~ 'Test%', ord
LIMIT  1;

!~~ is just the Postgres operator for NOT LIKE. You can use either. Note that by inverting the logic (NOT LIKE instead of LIKE), we can now use default ASCsort order and NULLs sort last, which may be important. Read on.

This is shorter (but not necessarily faster). It is also subtly different (more reliable) than the currently accepted answer by @Gordon.

When sorting by a boolean expression you must understand how it works:

The currently accepted answer uses ORDER BY <boolean expression> DESC, which would sort NULLs first. In such a case you should typically add NULLS LAST:

If street is defined NOT NULL this is obviously irrelevant, but that has not been defined in the question. (Always provide the table definition.) The currently accepted answer avoids the problem by excluding NULL values in the WHERE clause.

Some other RDBMS (MySQL, Oracle, ..) don't have a proper boolean type like Postgres, so we often see incorrect advice from people coming from those products.

Your current query (as well as the currently accepted answer) need the WHERE clause - or at least NULLS LAST. With the different expression in ORDER BY neither is necessary.

More importantly, yet, if multiple rows have a matching street (which is to be expected), the returned row would be arbitrary and could change between calls - generally an undesirable effect. This query picks the row with the smallest ord to break ties and produces a stable result.

This form is also more flexible in that it does not rely on the existence of a row with ord = 0. Instead, the row with the smallest ord is picked either way.

Faster with index

(And still correct.) For big tables, the following index would radically improve performance of this query:

CREATE INDEX address_street_pattern_ops_idx ON address(street text_pattern_ops);

Detailed explanation:

Depending on undefined details it may pay to add more columns to the index.
The fastest query using this index:

(
SELECT street, zip, city
FROM   address
WHERE  street LIKE 'Test%'
ORDER  BY ord  -- or something else?
-- LIMIT 1  -- you *could* add LIMIT 1 in each leg
)
UNION ALL
(
SELECT street, zip, city
FROM   address
ORDER  BY ord
-- LIMIT 1  -- .. but that's not improving anything in *this* case
)
LIMIT  1

BTW, this is a single statement.

This is more verbose, but allows for a simpler query plan. The second SELECT of the UNION ALL is never executed if the first SELECT produces enough rows (in our case: 1). If you test with EXPLAIN ANALYZE, you'll see (never executed) in the query plan.

Details:

Evaluation of UNION ALL

In reply to Gordon's comment. Per documentation:

Multiple UNION operators in the same SELECT statement are evaluated left to right, unless otherwise indicated by parentheses.

Bold emphasis mine.
And LIMIT makes Postgres stop evaluating as soon as enough rows are found. That's why you see (never executed) in the output of EXPLAIN ANALYZE.

If you add an outer ORDER BY before the final LIMIT this optimization is not possible. Then all rows have to be collected to see which might sort first.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Erwin . . . In Postgres, I don't sort by booleans (that was a MySQL-ism that crept in), so I changed that even before reading your answer. I added an edit to the post acknowledging your point. As usual, I learned something from your answer. – Gordon Linoff Jun 12 '15 at 23:53
  • 1
    Does Postgres really guarantee that the first subquery of a `UNION ALL` is executed before the second *and* that there is a short-circuit? I have tried researching this in other databases and wasn't able to find a trail of documentation (I agree that it might happen in practice; I'm wondering about a guarantee). – Gordon Linoff Jun 13 '15 at 01:59
  • @GordonLinoff: Good question (might be a fully fledged question, actually). I added a paragraph with a quote from the docs. – Erwin Brandstetter Jun 13 '15 at 02:14
  • Thank you very much. I've always been curious why these are not processed in parallel; I imagine that the Postgres approach is somehow implied by the standard. – Gordon Linoff Jun 13 '15 at 02:20
  • 1
    Like I said, the code I provided is an example. The actual criteria is user defined, so it could be anything, really. But as always, your answer provides invaluable details which help me to optimize my query. – Nikša Baldun Jun 15 '15 at 19:30
2

How about something like this... (I'm not familiar with PostgreSQL so syntax might be slightly off)

SELECT street, zip, city, 1 as SortOrder
FROM address
WHERE street LIKE 'Test%' 
-- 
union all
--
SELECT street, zip, city, 2 as SortOrder
FROM address
WHERE ord = 0
ORDER BY SortOrder
LIMIT 1;
amcdermott
  • 1,565
  • 15
  • 23
  • UNION ALL suggested (if you haven't decided to remove duplicates.) – jarlh Jun 10 '15 at 10:42
  • Fair point - also, you could have this as an inline table and just select street, zip and city to avoid including the ``SortOrder` column in your results. – amcdermott Jun 10 '15 at 10:44
  • If you strip the added `SortOrder` and the `ORDER BY`, you lose nothing and gain performance. I explain some more in my answer. – Erwin Brandstetter Jun 13 '15 at 00:47
0

You can do the following:

SELECT street, zip, city
FROM address
WHERE (EXISTS(SELECT * FROM address WHERE street LIKE 'Test%') AND street LIKE 'Test%') OR 
      (NOT EXISTS(SELECT * FROM address  WHERE street LIKE 'Test%') AND ord = 0)
Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75