How to avoid cross product/Cartesian/full join in SQL?

Question

We have a table A with column "ETA To Destination" in minutes.

id	ETA To Dest (mins)	Score
1	45
2	75

There is a configuration table where we have defined scores based on the time intervals.

For example - If the ETA is between 31 to 60 mins, score is 5 points, ETA is between 61 to 120 mins, score is 10 points,

From Time	To Time	Score
0	30	2
31	60	5
61	120	10

We need to find the score (in table A) based on the ETA minutes in the configuration table (between from and to time).

I do not want to write a full join that will do a sequential scan in the table. Is there a better alternative or a better design.

Table A can have million of records and configuration table can have few more additional time interval rows.

Could you focus more on what you are trying to achieve instead of one possible solution? For example, can the scores change in time? How is the score used ie.: is there a report summarising milions of records or sth? Is the schema already in place and you need to optimise the queries without changing it? — Apokralipsa, Mar 18 '23 at 06:23
Considering the schema is already in place, is there a better way to achieve the score without using full join ? We will sort it based on the score and show it. There is no other requirement. — Magesh Narayanan, Mar 18 '23 at 06:37
You are not going to "show" millions of rows. What are you doing *exactly*? Show your version of Postgres, the exact table definition of `A` and the desired result - obviously just a few rows, not millions of them. — Erwin Brandstetter, Mar 18 '23 at 08:17
we will sort the records and will show top 50 records in a paginated order. — Magesh Narayanan, Mar 18 '23 at 08:20

Erwin Brandstetter · Accepted Answer · 2023-03-18T08:34:15.007

SELECT a.id, a.eta, s.score
FROM   a
JOIN   score_tbl s ON a.eta BETWEEN s.from_time AND s.to_time
ORDER  BY a.eta DESC NULLS LAST  -- how to break ties??
LIMIT  50;

With a matching index on a(eta) to support this query, it's a matter of milliseconds. The LIMIT 50 you mentioned in a comment makes all the difference. Ideally:

CREATE INDEX a_eta_id_idx on a (eta DESC NULLS LAST) INCLUDE (id);

We can keep it that simple because higher eta correlates to higher score.

Details depend on undisclosed information. Basically, any B-tree index with eta as leading or only column will do. But you'll have to match yet undisclosed specifications how to break ties.

DESC NULLS LAST only if eta can be null (isn't defined NOT NULL). Else just DESC. See:

Sort by column ASC, but NULL values first?

If there can be duplicates on eta, you'll have to define how to break ties. My query makes an arbitrary pick. But that won't be good enough for proper pagination. See:

SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'

Thanks Erwin, I too have similar query, want to check if there is any other better option. Seems like we need to use join with eta and from time and to time. — Magesh Narayanan, Mar 19 '23 at 05:22

How to avoid cross product/Cartesian/full join in SQL?

1 Answers1