Slow MySQL query on update statement

Question

I am trying to move some data from a database to another. I am currently having over a million entries in my database and I was expecting this to take long but already passed 50min and no result :) . Here is my query:

UPDATE xxx.product AS p 
LEFT JOIN xx.tof_art_lookup AS l ON p.model_view = l.ARL_SEARCH_NUMBER 
SET p.model = l.ARL_DISPLAY_NR 
WHERE p.model_view = l.ARL_SEARCH_NUMBER;

Any help on how to improve this query will be welcome. Thanks in advance!

OUTER JOINs on updates are vanishingly rare (except IS NULLS). Are you sure that's what you want? If not, switch to an INNER JOIN. — Strawberry, Nov 12 '13 at 12:21
Would you recommend to get the data with php and save it to an array and then do the UPDATE ? — Ionut Tatu, Nov 12 '13 at 12:21
50 minutes? That's 300 updates per second if you have a million rows. Hang in there. A job like this could take overnight. Also, you need to let us know stuff like whether you're using InnoDB. — O. Jones, Nov 12 '13 at 12:22
Can you give details: 1) do joined columns or ARL_SEARCH_NUMBER contain NULLS? 2) how much (approx) rows in each table? 3) does JOIN actually give most of product rows or only a few (hit rate)? — LINQ2Vodka, Nov 12 '13 at 12:26
If you want, you could try to break the data into smaller chunks. Remember it is building a log so that all of these updates can be rolled back (ACID transaction). But why the LEFT VIEW? You could just as easily update the field to NULL then run the update with a regular JOIN. — AgRizzo, Nov 12 '13 at 12:28
1. ARL_SEARCH_NUMBER does not contain NULL. 2. Over a million maybe 2 in each table. 3. Yes, most of them. — Ionut Tatu, Nov 12 '13 at 12:29
Also, if you're updating column that is under clustered index then AFAIK server will need to recreate row. Let anyone know if i'm wrong. — LINQ2Vodka, Nov 12 '13 at 12:40

LINQ2Vodka · Answer 1 · 2013-11-12T13:04:59.310

2

Indexes on p.model_view, l.ARL_SEARCH_NUMBER if you not gonna get rid of JOINs.
Actually, it might be optimized depending on actual data amounts and their values (NULLs presence) by use of:
1. Monitoring query execution plan and , if it's not good, putting query hints for compiler or exchange JOINs for subqueries so compiler uses another type of join inside it (merge/nested loops/hashs/whatever)
2. Making a stored procedure with more comlicated but faster logic
3. Doing updates by small portions

edited Nov 12 '13 at 13:04

answered Nov 12 '13 at 12:21

LINQ2Vodka

2,996
2
27
47

Thanks for your help. I will try to see what taking so long and will be back with updates or hopefully to close this thread :D – Ionut Tatu Nov 12 '13 at 12:39

score 1 · Answer 2 · edited May 23 '17 at 12:12

Identify what makes slow.

check JOIN is optimized

run SELECT only:

SELECT COUNT(*)
FROM xxx.product p LEFT JOIN xx.tof_art_lookup l 
  ON p.model_view = l.ARL_SEARCH_NUMBER;

how long takes? and EXPLAIN SELECT ... check proper INDEX is used for JOIN.

If everything is fine for JOIN, then UPDATEing row is slow. this situation is hard to make things faster.

UPDATE = DELETE and INSERT

I didn't tried this. but sometimes, this strategy is faster.. UPDATE is DELETE old row and INSERT new row using new value.

// CREATE new table and INSERT
CREATE TABLE xxx.new_product
SELECT p.model_model, l. ARL_DISPLAY_NR, ... 
FROM xxx.product p LEFT JOIN xx.tof_art_lookup l 
  ON p.model_view = l.ARL_SEARCH_NUMBER;

// drop xxx.procuct
// rename xxx.new_product to xxx.product

divide table into small chunk, and run concurrently

I think your job is CPU bounded and your UPDATE query uses just one CPU can't have benefit many cores. xxx.product TABLE has no constraint for join, there for 1M rows are updated sequencially

My suggestion following.

give some conditions to xxx.product so that xxx.product divided 20 group. (I don't no which column would be better for you, as I have no information about xxx.product)

then run 20 queries at once concurrently.

for example:

// for 1st chunk
UPDATE xxx.product AS p 
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
  AND p.column BETWEEN val1 AND val2; <= this condition spliting xxx.product

// for 2nd chunk
UPDATE xxx.product AS p 
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
  AND p.column BETWEEN val2 AND val3;

...
...

// for 20th chunk
UPDATE xxx.product AS p 
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
  AND p.column BETWEEN val19 AND val20;

It is important to find BETWEEN value distribute table evenly. Histogram may help you. Getting data for histogram plot

Thanks for the help. I will give it a try and check to see whats taking so long. — Ionut Tatu, Nov 12 '13 at 12:38
@Ionut Laurentiu I'm looking forward to your reply! even if my suggestion does not improved your query. — Jason Heo, Nov 12 '13 at 12:43

Slow MySQL query on update statement

2 Answers2

check JOIN is optimized

UPDATE = DELETE and INSERT

divide table into small chunk, and run concurrently

Linked