0

I am trying to determine whether a function like EXISTS or COALESCE could reduce the heavy use of nested loops when searching for any match on 300 values that could occur in 16 different columns.

A user supplied a query that looks like the example below and asked to apply it in a large warehouse environment, to which it does not scale very efficiently. Adding indices is not an option. I can use a WITH clause to house the 300 codes for lookup, but cannot determine whether that would help to get the query to abort the table scans once the first match is found and move onto the next record.

SELECT
 A.product_id

FROM
  A inner join B on A.id = B.id
    inner join C on A.id = C.id

WHERE
   B.code1 in ('x1','x2','x3', ... 'x298', 'x299', 'x300')
or B.code2 in ('x1','x2','x3', ... 'x298', 'x299', 'x300')
or C.code3 in ('x1','x2','x3', ... 'x298', 'x299', 'x300')
 ...
or C.code15 in ('x1','x2','x3', ... 'x298', 'x299', 'x300')
or C.code16 in ('x1','x2','x3', ... 'x298', 'x299', 'x300')

While the code supplied works, the cost is high and I would appreciate help in getting execution times improved.

F001
  • 11
  • 1
  • 1
    Fix your data model! Storing values in numbered columns is a sign of a problem. If you had a junction/association table with one row per code, then the solution would be simpler. – Gordon Linoff Jun 05 '19 at 18:36
  • For this problem, I agree. However, this is an environment where such an architectural change could easily take a year or longer from my request to its implementation if approved. – F001 Jun 05 '19 at 20:01
  • Describe more in detail what problem you observe. How large are your tables and how selective are the predicates?. Publish the execution plan that is used (see [here](https://stackoverflow.com/a/34975420/4808122) some advice what information is helpful in case of performance problems. And *yes* nested loops on large tables may cause problems - the last resort to get rid of them are hints [USE_HASH](https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/Comments.html#GUID-FA1147B3-BCAA-41F9-B6A2-8DEDABF1C021) is a possible remedy. – Marmite Bomber Jun 05 '19 at 21:34

2 Answers2

1

One simple improvement would be splitting this into two queries:

SELECT A.product_id
FROM A JOIN
     B 
     ON A.id = B.id
WHERE B.code1 in ('x1','x2','x3', ... 'x298', 'x299', 'x300') OR
      B.code2 in ('x1','x2','x3', ... 'x298', 'x299', 'x300')
UNION ALL   -- or UNION to remove duplicates
SELECT A.product_id
FROM A JOIN
     C 
     ON A.id = C.id
WHERE C.code3 in ('x1','x2','x3', ... 'x298', 'x299', 'x300') OR
      ...
      C.code15 in ('x1','x2','x3', ... 'x298', 'x299', 'x300') OR
      C.code16 in ('x1','x2','x3', ... 'x298', 'x299', 'x300');

As I say in the comment, a table with one row per id and code would be a better data model.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Why should it be automatically a *better* model? While having 100 codes, you would have to join instead on two one million tables two hundred million tables and finaly *distinct* the result... – Marmite Bomber Jun 06 '19 at 19:44
0

Try joining against the code list. I'm curious whether or not this will improve your performance:

select A.product_id
from A
inner join B on B.id = A.id
inner join C on C.id = A.id
inner join (values ('x1'),('x2'),('x3'), ... ('x298'), ('x299'), ('x300')) as v(code)
    on v.code = B.code1 
    or v.code = B.code2
    or v.code = C.code3
    ...
    or v.code = C.code16

Alternatively, you could unpivot the B and C tables so that all of the codeN column values are in one column, and then filter that single column by your 300 values.

reasonet
  • 145
  • 12