2

COALESCE in Postgres is a function that returns the first parameter not null. So I used coalesce in subqueries like:

SELECT COALESCE (
 ( SELECT * FROM users WHERE... ORDER BY ...),
 ( SELECT * FROM users WHERE... ORDER BY ...),
 ( SELECT * FROM users WHERE... ORDER BY ...),
 ( SELECT * FROM users WHERE... ORDER BY ...)
);

I change the where in any query and they contain lots of params and CASE, also different ORDER BY clauses. This is because I always want to return something but giving priorities.

What I noticed while issuing EXPLAIN ANALYZE is that any query is executed despite the first one actually returns NOT a null value. I would expect the engine to run only the first one query and not the following ones if it returns not null.

This way I could have a bad performance.

So am I doing any bad practice and is it better to run the queries separately for performance reason?

EDIT:

Sorry you where right I don’t select * but I select only one column. I didn’t post my code because I am not interested in my query but it’s a generic question to understand how the engine is working. So I reproduce a very simple fiddle here http://sqlfiddle.com/#!17/a8aa7/4

I may be wrong but I think it behaves as I was telling: it runs all the subqueries despite the first one already returns a not null value

EDIT 2: ok I read only now it says never executed. So the other two queries aren’t getting executed. What confused me was the fact they were included in the query plan.

Anyways it’s still important for my question. Is it better to run all the queries separately for performance reasons? Because it seems like that even if the first one returns a not null value the other two subqueries can slow down the performance

91DarioDev
  • 1,612
  • 1
  • 14
  • 31
  • Can you edit your question and include what you are you trying to get to? Give an example of the data you have, what result you wish to obtain and your thought process or SQL you've tried to attain the result. `COALESCE` is best used like this `coalesce(new_salary, old_salary, 5000)`, which means, if the new_salary is null, use old_salary. If old_salary is null, use 5000. – zedfoxus Mar 28 '20 at 21:58
  • 2
    The query isn't valid SQL to begin with. You can't combine `coalesce()` with a `select *`. `coalesce()` works on single values, not on all columns of a table. You might get better answers if you take a step back and explain the underlying problem you are trying to solve. Please **[edit]** your question (by clicking on the [edit] link below it) and add some sample data and the expected output based on that data as [formatted text](https://meta.stackoverflow.com/a/251362). See [here](https://meta.stackexchange.com/questions/81852) for some tips on how to create nice looking text tables. –  Mar 28 '20 at 22:01
  • I assume that this is not your actual code. All these queries should return 1 row and 1 column to be valid inside COALESCE. Also be sure that *COALESCE only evaluates the arguments that are needed to determine the result; that is, arguments to the right of the first non-null argument are not evaluated* from https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL. – forpas Mar 28 '20 at 22:19
  • Just to prove my previous comment check this: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b970f62eed695070d222cb1e69c93789 which runs fine because the 2nd argument of coalesce is not evaluated. If it was evaluated then the query would fail with an error because the resursive cte would exceed the max number of iterations. – forpas Mar 28 '20 at 22:21
  • @a_horse_with_no_name – 91DarioDev Mar 28 '20 at 22:47
  • In your edit2 you state "confused me was the fact they were included in the query plan", but they MUST be included. Keep in mind that when the planer (optimizer) creates the plan it DOES NOT access actual data, that is what it is determining how to do. Therefore, it must always **plan** for the worst case scenario, in this case running all the sub selects. – Belayer Mar 29 '20 at 22:11
  • @Belayer Ah, I see. Thanks – 91DarioDev Mar 30 '20 at 08:46

1 Answers1

4

For separate SELECT queries, I suggest to use UNION ALL / LIMIT 1 instead. Based on your fiddle:

(select user_id from users order by age limit 1)  -- misleading example, see below
UNION ALL
(select user_id from users where user_id=1)
UNION ALL
(select user_id from users order by user_id DESC limit 1)
LIMIT 1;

db<>fiddle here

For three reasons:

  1. Works for any SELECT list: single expressions (your fiddle), multiple or whole row (your example in the question).

  2. You can distinguish actual NULL values from "no row". Since user_id is the PK in the example (and hence, NOT NULL), the problem cannot surface in the example. But with an expression that can be NULL, COALESCE cannot distinguish between both, "no row" is coerced to NULL for the purpose of the query. See:

  3. Faster.

Aside, your first SELECT in the example makes this a wild-goose chase. It returns a row if there is at least one. The rest is noise in this case.

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you for your answer it’s very interesting. Anyways I haven’t found any performance relevant differences rather than using coalesce with a single field in select – 91DarioDev Mar 29 '20 at 13:27