0

Ex.

SELECT *
FROM A
JOIN B ON A.idx = B.idx
JOIN C ON A.idx = C.idx
WHERE A.create_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR A.last_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR B.create_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR B.last_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR C.create_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR C.last_dt BETWEEN '2023-05-01' AND '2023-05-31';

It is a PostgreSQL DB

While multi-joining the tables, I want to SELECT only the data where the create_dt or last_dt of each table is the latest.
However, there is a problem that the speed is too slow. How to solve in this case?

Performance is fine when I leave only the A table WHERE condition.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
younghyun
  • 341
  • 1
  • 8
  • 1
    *I want to SELECT only the data where the create_dt or last_dt of each table is **the latest**.* ?? This is unclear. The latest in what data collection? within the row? the group? the table? total rowset? Show an example.. – Akina Jun 02 '23 at 04:58
  • [So do you have your answer?](https://meta.stackexchange.com/a/5235/169168) – Erwin Brandstetter Jun 24 '23 at 04:00

2 Answers2

2

I want to SELECT only the data where the create_dt or last_dt of each table is the latest.

Then you certainly cannot join on idx since the latest per table won't share the same idx value.

Your query does not do what you say. This one does:

SELECT *
FROM  (
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_a, *
   FROM   a
   WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY create_dt DESC
   LIMIT 1
   )
   UNION ALL
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_a, *
   FROM   a
   WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY last_dt DESC
   LIMIT 1
   )
   ORDER BY latest_a
   LIMIT 1
   ) a
CROSS JOIN  (
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_b, *
   FROM   b
   WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY create_dt DESC
   LIMIT 1
   )
   UNION ALL
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_b, *
   FROM   b
   WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY last_dt DESC
   LIMIT 1
   )
   ORDER BY latest_b
   LIMIT 1
   ) b
CROSS JOIN  (
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_c, *
   FROM   a
   WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY create_dt DESC
   LIMIT 1
   )
   UNION ALL
   (
   SELECT GREATEST(create_dt, last_dt) AS latest_c, *
   FROM   a
   WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
   ORDER  BY last_dt DESC
   LIMIT 1
   )
   ORDER BY latest_c
   LIMIT 1
   ) c -- USING (idx);

All parentheses required.
A bit verbose. But it's as fast as this gets - provided you have these indexes:

CREATE INDEX a_create_dt_idx ON A (create_dt);
CREATE INDEX a_last_dt_idx ON A (last_dt);

CREATE INDEX b_create_dt_idx ON B (create_dt);
CREATE INDEX b_last_dt_idx ON B (last_dt);

CREATE INDEX c_create_dt_idx ON C (create_dt);
CREATE INDEX c_last_dt_idx ON C (last_dt);

It will be two index seeks per table, directly picking the one qualifying row every time.

I am joining with an unconditional CROSS JOIN, since each subquery returns exactly one row, provided at least one qualifies.

If one of the subqueries finds no row, the result is empty. Maybe you really want a FULL OUTER JOIN to preserve results from the other tables if one comes up empty. Or just 3 result rows.

Then again, I wouldn't be surprised if you didn't exactly say what you really need. My educated guess: you want something like this (the latest row per idx from each table):

(
SELECT DISTINCT ON (idx) 'a_latest_create_dt' AS what, *
FROM   a
WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, create_dt DESC, last_dt DESC
)
UNION ALL
(
SELECT DISTINCT ON (idx) 'a_latest_last_dt' AS what, *
FROM   a
WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, last_dt DESC, create_dt DESC
)

UNION ALL
(
SELECT DISTINCT ON (idx) 'b_latest_create_dt' AS what, *
FROM   b
WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, create_dt DESC, last_dt DESC
)
UNION ALL
(
SELECT DISTINCT ON (idx) 'b_latest_last_dt' AS what, *
FROM   b
WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, last_dt DESC, create_dt DESC
)

UNION ALL
(
SELECT DISTINCT ON (idx) 'c_latest_create_dt' AS what, *
FROM   c
WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, create_dt DESC, last_dt DESC
)
UNION ALL
(
SELECT DISTINCT ON (idx) 'c_latest_last_dt' AS what, *
FROM   c
WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
ORDER  BY idx DESC, last_dt DESC, create_dt DESC
)
ORDER BY idx, what;

This time I list the latest rows per idx for each table: one for create_dt, one for last_dt. Makes 6 rows if there is one in the time frame for every flavor.

About DISTINCT ON:

Assuming all involved columns are NOT NULL. Else you may have to do more ...

Matching indexes:

CREATE INDEX a_create_dt_idx ON A (idx, create_dt);
CREATE INDEX a_last_dt_idx ON A (idx, last_dt);

CREATE INDEX b_create_dt_idx ON B (idx, create_dt);
CREATE INDEX b_last_dt_idx ON B (idx, last_dt);

CREATE INDEX c_create_dt_idx ON C (idx, create_dt);
CREATE INDEX c_last_dt_idx ON C (idx, last_dt);
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

You might try adding the following indices to the three tables:

CREATE INDEX idx_a ON A (idx, create_dt, last_dt);
CREATE INDEX idx_b ON B (idx, create_dt, last_dt);
CREATE INDEX idx_c ON C (idx, create_dt, last_dt);

These indices, if used, should speed up the joins in your query.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thank you. My cowoker said me that date columns aren't usually used as indexes. So the index request was denied. – younghyun Jun 02 '23 at 02:07
  • 1
    @younghyun; Date columns are as good as any for indexes. Question is whether the index fits the query, and whether the filter is selective enough. – Erwin Brandstetter Jun 02 '23 at 04:02