Is LEFT JOINING two tables with no reference to each other ON string conditions a bad idea?

Question

I'm migrating database between 2 systems using PHP and MySQL.

In the old one I have 3 tables of interest:

t1
id (int)
...

t2
id (int)
t1_id (int)
d (string)
...

t3
id (int)
t1_id (int)
ds (string)
e (int)
...

In the new one I have only t1 and t2

t2.d can have e.g. "abc" or "def" t3.ds can have "abc" or "def" or "abc, def"

I have created the following query:

SELECT
    t2...,
    t3.e
FROM t2
LEFT JOIN t3
    ON t2.id = t3.id
    AND t3.ds LIKE CONCAT("%", t2.d, "%")
WHERE t2.id = ?

The query does work, but I am worried about the performance of this JOIN when I have lots of entries (to migrate I obviously iterate over each of the entries from t1, each having multiple entries in t2 and t3).

So at the bottom, I go back to the question - is it worth to join them like that or should I use a different approach, like separate query or data manipulation on PHP level?

Here's MySQL EXPLAIN if that's any relevant (unfortunately it doesn't mean too much to me, so I appreciate any help):

id | select_type | table | type | possible_keys        | key           | key_len | ref                    | rows  | Extra
1  | SIMPLE      | modi  | ref  | PRIMARY,order_number | order_number  | 4       | const                  | 3     | Using temporary; Using filesort
1  | SIMPLE      | ai    | ref  | detail_number        | detail_number | 4       | max.modi.detail_number | 1     | NULL
1  | SIMPLE      | edi   | ALL  | NULL                 | NULL          | NULL    | NULL                   | 26389 | Using where; Using join buffer (Block Nested Loop)

have you considered using find_in_set? http://stackoverflow.com/questions/11463090/single-mysql-field-with-comma-separated-values — xQbert, Jul 17 '15 at 12:07

score 1 · Accepted Answer · edited Jan 07 '22 at 15:36

1

If you are concerned about performance, then do not store lists in a string. You should have a junction table, with one row per element of the list. In other words, 'abc, def' is a no-no. Another table with two rows, one for 'abc' and another for 'def' is the way to go.

Because you have a data structure that is not optimized for SQL, there is little you can do from a performance perspective. The like is probably about as good as you can do.

edited Jan 07 '22 at 15:36

Nimantha

6,405
6
28
69

answered Jul 17 '15 at 11:41

Gordon Linoff

1,242,037
58
646
786

The legacy system has many, many bad solutions and there's nothing to do about this, rather than building the new system. The new system doesn't have this problem, yet I still have to migrate lots of data between them two and mainly due to incompatibilities between the systems, the script has to be run many times and hence the need/will to try and optimise it. – osh Jul 17 '15 at 11:56
Unfortunately, you cannot really optimize these types of queries. If performance is really, really important, you might want to use a tool such a python to read the data and create a junction table. Then add the appropriate indexes to the table. The modified queries should be fast. – Gordon Linoff Jul 17 '15 at 12:11

score 0 · Answer 2 · answered Jul 17 '15 at 11:43

0

If you are having index proper in t2 and t3 table then there is no issue. Please note that while doing the Left join the Duplicate entries won't insert again and again

answered Jul 17 '15 at 11:43

Krishnadas V R

1
1

Can you elaborate on duplicating entries? Do you mean that if a row from t3 was matched once, it'll be skipped next time, even if it could be matched another time? – osh Jul 17 '15 at 12:03

Is LEFT JOINING two tables with no reference to each other ON string conditions a bad idea?

2 Answers2