0

I have two tables T1 and T2 and want to update one field of T1 from T2 where T2 holds massive data. What is more efficient? Updating T1 in a for loop iteration over the values or Left join it with T2 and update.

Please note that i'm updating these tables in a shell script

Sadikhasan
  • 18,365
  • 21
  • 80
  • 122
ivcode
  • 235
  • 2
  • 14
  • possible duplicate of [MySQL: Update rows in table by iterating and joining with another one](http://stackoverflow.com/questions/5532050/mysql-update-rows-in-table-by-iterating-and-joining-with-another-one) – jmail Apr 02 '14 at 07:21
  • It would appear that you either need to iterate over the 6m rows in T2 and do a check on T1 (and possible update) for each one, or you need to do a read on T1 and check each row against T2 to see if it needs an update. The will be VERY slow. The 2nd is doing much the same work as a JOIN but with the over head of a lot of extra parsing of SQL. Both would likely be far slower than doing an update on a JOIN, but with the advantage that they would only be locking the tables for a tiny time when each update is done. – Kickstart Apr 02 '14 at 10:36

2 Answers2

0

In general, the JOIN will always work much better than a loop. The size should not be an issue if it is properly indexed.

smoore4
  • 4,520
  • 3
  • 36
  • 55
  • table T2 is not indexed – ivcode Apr 02 '14 at 07:32
  • That makes all the difference then. A join on an column that is not indexed will have very poor performance if there is a lot of data. Still, set based solutions are nearly always more performant. You should add an index if that is a possibility. – smoore4 Apr 02 '14 at 07:37
  • then using "ALTER TABLE T2 ADD INDEX (COLUMN_NAME);" before the join will improve the performance right. – ivcode Apr 02 '14 at 07:43
  • will it take long to add an index to a column where table is very large? – ivcode Apr 02 '14 at 07:44
  • Yes, your index statement looks right. Define "very large." It will depend on the specs of the server and the size of the data. ALTER TABLE `T2` ADD INDEX `idx_columnname` (`COLUMN_NAME` ASC) ; – smoore4 Apr 02 '14 at 07:50
  • That's a lot of data. Don't add the index during production hours. Check this out: http://stackoverflow.com/questions/4244685/create-an-index-on-a-huge-mysql-production-table-without-table-locking – smoore4 Apr 02 '14 at 08:04
  • well it's a table that has to be online 24/7, so will iterating be more productive? – ivcode Apr 02 '14 at 08:12
  • It would be wise to look at this first: http://stackoverflow.com/questions/4244685/create-an-index-on-a-huge-mysql-production-table-without-table-locking?lq=1 You might consider copying the table with mysqldump and adding an index to the copy, then update the rows. After, add a trigger on T2 to update T1 on Insert or Update. Does T1 also have a large amount of data? – smoore4 Apr 02 '14 at 08:23
  • T1 has only 20000+ rows. yes I looked at your link. Adding an index is not the solution it seems – ivcode Apr 02 '14 at 08:28
0

There is no simple answer which will be more effective, it will depend on table size and data size to which you are going to update in one go.

Suppose you are using innodb engine and trying to update 1,000 or more rows in one go with 2 heavy tables join and it is quite frequent then it will not be good idea on production server as it will lock your table for some time and due to this locking some other operations also can be hit on your production server.

Option1: If you are trying to update few rows and based on proper indexed fields (preferred based on primary key) then you can go with join.

Option2: If you are trying to update a large amount of data based on multiple tables join then below option will be better:

Step1: Create a stored procedure.

Step2: Keep below query results in a cursor.

suppose you want TO UPDATE corresponding field2 DATA of TABLE table2 IN field1 of TABLE table1:

SELECT a.primary_key,b.field2 FROM table1 a JOIN table2 b ON a.primary_key=b.foreign_key WHERE [place CONDITION here IF any...];

Step3: Now update all rows one by one based on primary key using stored values in cursor.

Step4: You can call this stored procedure from your script.

Zafar Malik
  • 6,734
  • 2
  • 19
  • 30