Optimizing JOIN ON OR without repeating query for every OR

Question

As this answer explains, a JOIN ON combined with OR can't be optimized. I've indeed noticed the horrible performance in a query that I'm trying to write.

To describe my scenario, a header record should be returned along with data from all related item records. An item can be related to a header record based on one of three fields. The following SQL seems syntactically sound but is hideously expensive as it can't optimize the JOIN ON OR:

SELECT
    header.a,
    header.b,
    item.x,
    item.y,
    item.z
    FROM header
    LEFT OUTER JOIN item    ON item.x   = header.a
                            OR item.y   = header.a
                            OR item.z   = header.b;

Note that this was generalised from a query that is a great deal more complex (there are another 6 joins and various filters involved). I expect that the JOIN ON OR implementation will work, though I can't confirm it as the full table scan that results from the OR is likely to take several hours to complete. (Bonus question: will the triple condition lead to a single sequential FTS or to three consecutive ones?)

Because of the complexity of the surrounding query I want to avoid the UNION ALL approach suggested on the linked answer. It's not just the fact that I want to avoid that level of repetition but also that the rest of the query, while optimized, is itself quite expensive. Is there an alternative that I'm not seeing?

Which dbms are you using? (Different products optimize things different ways...) — jarlh, Aug 18 '15 at 10:07
@gotqn Around 70. I'm selecting 8 of them. It's a column store if that's relevant. — Lilienthal, Aug 18 '15 at 10:30
@Lilienthal I am not familiar with `hana` but in `t-sql` you can create additional `filtering` indexes with `included` columns which can optimize the performance reducing the `read` operations. — gotqn, Aug 18 '15 at 10:35
@gotqn Thanks for the advice. It wasn't until after I posted that I realised that the limited popularity of HANA could make it difficult for people to answer. I was hoping there was something in the standard SQL syntax around my problem but there might not be such an alternative to `UNION`. — Lilienthal, Aug 18 '15 at 10:40

score 1 · Answer 1 · answered Aug 18 '15 at 15:03

Although I see your point of having a whole lot of complexity going on in your overall query, I propose to split this execution and actually make use of SQLScript in this case.

As you have effectively three different ways to related header and item, that makes three queries. With SQLScript it's fairly straight forward to have those three simple join queries assigned to three separate table variables.

Each of those joins can then be processed very efficiently and also all three of them can be executed at once. This will not reduce the overall work but the total execution time.

Also, depending on the nature of your query you could union the three table variables and use them as the input into the complex remaining calculations.

In any case: these are three independent linkages between your tables and it seems to be most elegant and efficient to not try to cram them together in one "mother of all SQL" statements :-)

Hmm, good point, I might go this way as I've indeed hit a wall in trying to create some monstrous single SQL statement. I assume that I'd implement that SQL Script as an ABAP Managed Database Procedure? — Lilienthal, Aug 18 '15 at 15:24

score 1 · Accepted Answer · answered Aug 18 '15 at 16:11

1

This might look ugly, but it avoids the OR, and has a chance of using indexes, if present:

SELECT
    header.a
    , header.b
    , COALESCE(i1.x, i2.x, i3.x) AS itemx
    , COALESCE(i1.y, i2.y, i3.y) AS itemy
    , COALESCE(i1.z, i2.z, i3.z) AS itemz
    FROM header h0
    LEFT OUTER JOIN item i1 ON i1.x = h0.a
    LEFT OUTER JOIN item i2 ON i2.y = h0.a
    LEFT OUTER JOIN item i3 ON i3.z = h0.b
        ;

Note: it does assume that item.{x.y,z} are not NULLable.

answered Aug 18 '15 at 16:11

wildplasser

43,142
8
66
109

Hmm, very interesting approach, this might be a good single-query solution. I won't be able to verify this for my scenario until a few days from now but this looks sound in theory, +1. – Lilienthal Aug 18 '15 at 17:07
Confirmed in the SQL editor. SAP HANA docu entry for COALESCE is [here](https://help.sap.com/saphelp_hanaplatform/helpdata/en/20/db6dd7751910149105978452f92bbd/content.htm). I had to join another table to the item table which seems to work with `INNER JOIN subitem ON subitem.key = COALESCE(i1.key, i2.key, i3.key)` – Lilienthal Aug 20 '15 at 08:41
Note: the `ON xxx = COALESCE(a,b,c, ...)` still has the multiple -OR JOIN problem (but in a different form) check your query plan; it might be *very* subobtimal. Also verify the NULLability (the query could yield wrong results if {x,y,z} are NULLable) – wildplasser Aug 20 '15 at 08:47
Agreed, my query is definitely still a work in progress, I'm evaluating whether the COALESCE will work (efficiently) for my particular scenario. That said, this solution works for the core question, hence the accept. – Lilienthal Aug 20 '15 at 09:08
I don't know HANA's query generator, but making the OR'red UNION more explicit could help it. – wildplasser Aug 20 '15 at 09:19

Optimizing JOIN ON OR without repeating query for every OR

2 Answers2