Multiple correlated subqueries with different conditions to same table

Question

I have two tables:

orders

| id | item_id | quantity | ordered_on |
|----|---------|----------|------------|
|  1 |    1    |    2     | 2016-03-09 |
|  2 |    1    |    2     | 2016-03-12 |
|  3 |    4    |    3     | 2016-03-15 |
|  4 |    4    |    3     | 2016-03-13 |

stocks

| id | item_id | quantity | enter_on   | expire_on  |
|----|---------|----------|------------|------------|
|  1 |    1    |   10     | 2016-03-07 | 2016-03-10 |
|  2 |    1    |   20     | 2016-03-11 | 2016-03-15 |
|  3 |    1    |   20     | 2016-03-14 | 2016-03-17 |
|  4 |    4    |   10     | 2016-03-14 |    NULL    |
|  5 |    4    |   10     | 2016-03-12 |    NULL    |

I'm trying to create a view to show the orders along with their closest stocks enter_on like this (I'm using include_after and include_before to give an overview on which date I want to exclude the item that's preordered, so the stock would reflect correctly.)

include_after is always going to be the stock that came in but not expired yet, if expired, show NULL, include_before will always show the next incoming stock enter_on, unless there's an expire_on that's earlier than the next enter_on.

| item_id | quantity | ordered_on | include_after | include_before |
|---------|----------|------------|---------------|----------------|
|    1    |    2     | 2016-03-09 |  2016-03-07   |   2016-03-10   |
|    1    |    2     | 2016-03-12 |  2016-03-11   |   2016-03-14   |
|    4    |    3     | 2016-03-13 |  2016-03-12   |   2016-03-14   |
|    4    |    3     | 2016-03-15 |  2016-03-14   |      NULL      |

So this is what I came up with:

SELECT
  o.item_id, o.quantity, o.order_on, (
    SELECT COALESCE(MAX(s.enter_on), NULL::DATE)
    FROM stocks s
    WHERE s.enter_on <= o.order_on AND s.item_id = o.item_id
  ) as include_after, (
    SELECT COALESCE(MIN(s.enter_on), NULL::DATE)
    FROM stocks s
    WHERE s.enter_on > o.order_on AND s.item_id = o.item_id
  ) as include_before
FROM
  orders o;

It works fine (I haven't included the expire_on part), but I'm worrying about performance issue for using two subqueries in the select.

Does anyone have some alternative suggestions?

UPDATE

I'm using Postgresql 9.4 (Can't add anymore tags)

the actual problem is way more complicated than I stated, it's a lot of tables joined together and views, I shrunk it down to just one table to grasp the concept if there are alternatives

Actual table definitions would be much more useful. Ideally, complete `CREATE TABLE` statements showing data types and constraints. And, obviously, your version of Postgres. — Erwin Brandstetter, Jun 12 '16 at 02:24

score 1 · Answer 1 · edited Apr 13 '17 at 12:42

You should worry about performance when the situation arises. For the example that you provided, an index on stocks(item_id, enter_on, expire_on) should be sufficient. Then you might actually want two indexes: stocks(item_id, enter_on desc, expire_on).

If the performance is not sufficient, you have two choices. One is a GIST index for ranges. (Here is an interesting discussion of the issue.) The second is an alternative query formulation.

However, I would attempt to optimize the query until there is enough data to show a performance problem. Solutions on smaller amounts of data just might not scale well.

Erwin Brandstetter · Answer 2 · 2022-07-07T01:17:24.767

Discussing the query you display, also not considering expire_on.

`COALESCE` for a correlated subquery

First off, the expression ~~COALESCE(anything, NULL)~~ never makes sense. You would replace NULL with NULL.

Aggregate functions like max() return NULL anyway (preventing "no row"), even if no qualifying row is found. (The exception being count(), which returns 0).

A correlated subquery that would return "no row" (like the variant with ORDER BY ... LIMIT 1 I demonstrate below) defaults to NULL for the column value.

So, if you wanted to use COALESCE in this context, you would wrap it around the correlated subquery as a whole - and provide a default for NULL.

Query

I'm worrying about performance issue for using two subqueries in the select.

It depends.

If there are only few rows per item_id in table stocks and / or only an index on stocks(item_id), then it would make sense to merge the two correlated subqueries into one LATERAL subquery with conditional aggregates:

SELECT o.item_id, o.quantity, o.order_on
     , s.include_after, s.include_before
FROM  orders o
    , LATERAL (
   SELECT max(enter_on) FILTER (WHERE enter_on <= o.order_on) AS include_after
        , min(enter_on) FILTER (WHERE enter_on >  o.order_on) AS include_before
   FROM   stocks
   WHERE  item_id = o.item_id
   ) s;

Since the subquery returns a row in any case due to the aggregate functions, a simple CROSS JOIN is fine. Else you might want to use LEFT JOIN LATERAL (...) ON true. See:

What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

The aggregate FILTER clause requires Postgres 9.4+. There are alternatives for older versions. See:

Aggregate columns with additional (distinct) filters

If, on the other hand, you have many rows per item_id in table stocks and an index ON stocks (item_id, enter_on), your query might still be faster. Or this slightly adapted version (test both!):

SELECT o.item_id, o.quantity, o.order_on
   , (SELECT s.enter_on
      FROM   stocks s
      WHERE  s.item_id = o.item_id
      AND    s.enter_on <= o.order_on
      ORDER  BY 1 DESC NULLS LAST
      LIMIT  1) AS include_after
   , (SELECT s.enter_on
      FROM   stocks s
      WHERE  s.item_id = o.item_id
      AND    s.enter_on > o.order_on
      ORDER  BY 1
      LIMIT  1) AS include_before
FROM  orders o;

Because both correlated subqueries can be resolved to a single index lookup each.

To optimize performance, you might need a 2nd index on stocks(item_id, enter_on DESC NULLS LAST). But don't create specialized indexes unless you actually need to squeeze out more read performance for this query (key word: premature optimization).

Detailed discussion in this related answer:

Optimize GROUP BY query to retrieve latest row per user

thanks for the suggestions, the actual problem is way more complicated than I stated, it's a lot of tables joined together and views, I shrunk it down to just one table to grasp the concept if there are alternatives, looks like I have a lot of readings to do. — lusketeer, Jun 12 '16 at 15:25

Multiple correlated subqueries with different conditions to same table

2 Answers2

COALESCE for a correlated subquery

Query

`COALESCE` for a correlated subquery