Index for OR query where value is the same on both columns

Question

I have following use case:

My table (significant parts):

CREATE TABLE transfers (
    id int,
    product_id int,
    source_location_id int,
    destination_location_id int,
    quantity int,
    created_at datetime
);

Now my query is:

SELECT * 
FROM transfers 
WHERE (source_location_id = 123 OR destination_location_id = 123)
ORDER BY created_at DESC
LIMIT 100 OFFSET 400

At the moment that table contains 20 million rows and this query takes 2-3 minutes to finish)

Is there a way to create an index for that OR clause since this specific id will always be the same for both locations in that SELECT.

I have seen some people mentioning unions for OR clause, but I am using an ORM that does not support UNIONS nor SUBQUERIES, plus unions will not work with pagination.

AFAIK creating an index for both columns should fix your problem. Have you tried that? i.e. add `INDEX source_destination_location_Id (source_location_id, destination_location_id)` to your `CREATE TABLE` (https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html) — tftd, May 29 '23 at 15:48
No, a multi-column index will not help. I wrote an explanation here: https://stackoverflow.com/questions/13750475/sql-performance-union-vs-or/13866221#13866221 — Bill Karwin, May 29 '23 at 15:55
If you are using an ORM that doesn't allow UNION or subquery, then I'm afraid you can't optimize this query while using the ORM methods. You will have to abandon use of the ORM and write the SQL directly, at least for this specific query. That should be ordinary use of any ORM — use the object methods for simple, repetitive queries, and use SQL directly for queries the ORM doesn't support well. — Bill Karwin, May 29 '23 at 15:58

ysth · Answer 1 · 2023-05-29T16:55:54.163

1

No, there is no index that will help this. To have this query indexed, you need a union, optimally like:

SELECT transfers.* 
FROM transfers 
JOIN (
    (
        SELECT id, created_at
        FROM transfers 
        WHERE source_location_id = 123
        ORDER BY created_at DESC
        LIMIT 500
    )
    UNION DISTINCT
    (
        SELECT id, created_at
        FROM transfers 
        WHERE destination_location_id = 123
        ORDER BY created_at DESC
        LIMIT 500
    )
) ids USING (id)
ORDER BY ids.created_at DESC
LIMIT 100 OFFSET 400

If source and destination are never the same, UNION ALL should be used instead.

Alternatively, add another table like:

CREATE transfer_locations (
    transfer_id int,
    location_id int,
    created_at datetime,
    primary key (location_id, transfer_id)
)

with rows for the source and destination for each transfer, and select from that, joining transfers.

edited May 29 '23 at 16:55

answered May 29 '23 at 16:01

ysth

96,171
6
121
214

the `DISTINCT` should go, and some `()` are needed, see: https://dbfiddle.uk/ZGTStOAh And I am not sure about the index `transfer_locations` – Luuk May 29 '23 at 16:11
@Luuk no, if source and destination are the same, DISTINCT is required. thanks for catching the parens issue – ysth May 29 '23 at 16:13
2

oh, you want just UNION instead of UNION DISTINCT, which defaults to UNION DISTINCT. I prefer to always have it show – ysth May 29 '23 at 16:14
The DISTINCT is not mandatory, and I always leave it out. (I cannot find the docs that it can be left way .... ) – Luuk May 29 '23 at 16:24
you almost always should do UNION ALL, so I prefer to call out the few times DISTINCT is actually needed by leaving it in. re docs, "By default, duplicate rows are removed from results of set operations. The optional DISTINCT keyword has the same effect but makes it explicit" - https://dev.mysql.com/doc/refman/8.0/en/set-operations.html#set-operations-distinct-all – ysth May 29 '23 at 16:53

Index for OR query where value is the same on both columns

1 Answers1