0

I have 2 tables which I join this way:

UPDATE wm_rendi rendi JOIN wm_acquisti ON rendi.isrc_isbn = wm_acquisti.barcode SET rendi.user_id = wm_acquisti.user_id WHERE rendi.user_id IS NULL

This query takes 24 ho hours to complete...

This is the wm_acquisti table, with +18'000 records:

USER_ID    BARCODE
12         600001
56         500003
12         600007
12         800018
32         100004
56         500007
...        ...

This is the wm_rendi table, with 910'040 records up to now:

USER_ID      ISRC_ISBN
NULL         600001
NULL         500003
...          ...

So, because of the loooooong taking of this query, I thought I could shorten the wm_acquisti this way:

USER_ID    BARCODE
12         600001|600007|800018
56         500003|500007
32         100004

The point is that I have to join these two tables based on the barcode, but it takes waaaay to long.

Is there a way of obtaining the same join using the shorter table (pipe separated barcodes) or any other method?

Mr.Web
  • 6,992
  • 8
  • 51
  • 86
  • Why would you ever want to denormalize such a small table? You won't really gain anything but a lot of problems and it will make using the database harder. The answer to your question anyway would be to use a query or function to transform the pipe separated values into a proper table and then do the join the normal way... – jpw Mar 31 '15 at 22:30
  • 1
    See [Is storing a delimited list in a database column really that bad?](http://stackoverflow.com/q/3653462) – eggyal Mar 31 '15 at 22:34
  • You'll probably want to use [`FIND_IN_SET()`](http://www.w3resource.com/mysql/string-functions/mysql-find_in_set-function.php) in your join condition – scrowler Mar 31 '15 at 22:37
  • I see both of your points, but then there must be something wrong. I have the wm_rendi with 910'040 records for now and it takes 24hrs to do the above `JOIN`... (and is ONE of 4 operations I have to do on that table once every three months). How could it be? - I edited the original post. – Mr.Web Apr 01 '15 at 13:00

1 Answers1

1

You could write a query that would emulate that join operation. But you don't want to. It will be slow, it will be ugly, and it will be unmaintainable.

Remember, the speed of a join is affected not by the number of rows in the table, but by the number of binary comparison operations that must be performed. The things which affect this are

  1. The number of data points from each table. How those points are stored is immaterial. You still need to compare each barcode in one table to each barcode in the other.

  2. INDEXES. Remember that a join without indexes takes O(nxm) binary comparisons where n is the # of rows in the smaller table and m is the # of rows in the larger table. On the other hand, index lookup of a value is O(log(n)) so if the query optimizer is smart, this should take O(nlog(m)) opertions. In your case, an indexed field will be around 45,000 times faster. So if you don't have indexes, or the RMDBS isn't using them, that's a flag.

Likewise ake sure that wm_rendi.USER_ID is indexed, that will shorten the null lookup by an order of magnitude.

Make sure all columns have unique contstraints of course.

As several folks have pointed out, there is never a good reason to store delimited data in a table. It breaks normality, slows down queries and generally makes everything bad.

Nick Bailey
  • 3,078
  • 2
  • 11
  • 13
  • I see your points, but then there must be something wrong. I have the wm_rendi with 910'040 records for now and it takes 24hrs to do the above JOIN... (and is ONE of 4 operations I have to do on that table once every three months). How could it be? The wm_acquisti is 18000 but the other one is huge... - I edited the original post. – Mr.Web Apr 01 '15 at 13:00
  • Updated with some other thoughts about possible performance issues. – Nick Bailey Apr 01 '15 at 14:14
  • F@&€K meee! How could I not know this was the reason!!?? Index! U saved my life. Now indexed takes 28sec... – Mr.Web Apr 01 '15 at 18:39