Select IN from multiple columns

Question

So I have a table, lets call it MAIN table, it has the following example columns

Name,
Code_1,
Code_2,
Code_3,
Code_4,
Code_5

(in my real example there's 25 Code columns)

I have a set of 300 codes that I inserted into a temporary table, what would be the best way to get the rows from the MAIN table where it matches a code from the temporary table?

Here's what I have so far that works, but it seems extremely inefficient

SELECT * FROM MAIN WHERE (CODE_1 IN (SELECT CODE FROM TMP_TABLE) 
OR CODE_2 IN(SELECT CODE FROM TMP_TABLE) 
OR CODE_3 IN (SELECT CODE FROM TMP_TABLE)
OR CODE_4 IN (SELECT CODE FROM TMP_TABLE)
OR CODE_5 IN (SELECT CODE FROM TMP_TABLE))

25 code columns sounds like a bad datamodel. It is not necessarily so, but it seems so. Usually you'd rather have a main table with a name and some ID plus a child table with one record per main ID and code. With such a data model the query would be very simple to write. — Thorsten Kettner, Apr 24 '18 at 19:40
@ThorstenKettner I agree, that's pretty bad. At least it's not a single text column with comma-separated codes, so I'd say it's an improvement. — Sergey Kalinichenko, Apr 24 '18 at 19:43
Unfortunately I don't have control over the data model, long story short due to the way that data flows through our systems it does somewhat make sense as we aren't trying to break the data down at that point in the system. — Dcarr, Apr 24 '18 at 19:47
Then I'd use dasblinkenlight's query. Seems like the best approach. — Thorsten Kettner, Apr 24 '18 at 19:53

score 4 · Accepted Answer · answered Apr 24 '18 at 19:38

4

One approach would be to use a correlated subquery:

SELECT *
FROM MAIN m
WHERE EXISTS (
    SELECT *
    FROM TMP_TABLE t
    WHERE t.CODE = m.CODE_1 OR t.CODE = m.CODE_2 OR ...
)

answered Apr 24 '18 at 19:38

Sergey Kalinichenko

714,442
84
1,110
1,523

I wouldn't be surprised if this had better performance than the OP's version. – Gordon Linoff Apr 24 '18 at 21:34
1

. . You misunderstood. I think this would have better performance because Oracle implements skip-scan index searches. – Gordon Linoff Apr 25 '18 at 03:12

Raunak Thomas · Answer 2 · 2018-04-24T19:43:38.370

2

A join would be faster

SELECT * FROM MAIN 
inner join TMP_TABLE
on main.code_1 = tmp_table.code 
    or main.code_2 = tmp_table.code 
    or main.code_3 = tmp_table.code
    or main.code_4 = tmp_table.code
    or main.code_5 = tmp_table.code

But as mentioned in the comment, the join could potentially increase the number of rows if in the main table multiple code_## match the join criteria in the tmp_table

edited Apr 24 '18 at 19:43

answered Apr 24 '18 at 19:39

Raunak Thomas

1,393
1
12
28

Why would a `JOIN` be faster? I don't see this. – Gordon Linoff Apr 24 '18 at 21:33
I've always been kind of confused about which one is faster, and from what I have read, it depends on the execution plan. But in general a join is faster than a subquerry. https://stackoverflow.com/questions/141278/subqueries-vs-joins/141310#141310 – Raunak Thomas Apr 25 '18 at 04:42
That answer is misleading (because the correlated subquery can still be the fastest query) and Oracle has a much, much smarter optimizer than MySQL. – Gordon Linoff Apr 25 '18 at 12:45

Select IN from multiple columns

2 Answers2