I have an import table with user data, and I need to mark the rows with duplicate field values because they should not be imported.
CREATE TABLE `import` (
ID int(10) unsigned NOT NULL AUTO_INCREMENT,
method varchar(20) DEFAULT NULL,
f1 text,
f2 text,
PRIMARY KEY (ID)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
The field f1
can hold duplicate values. The query to select them works:
SELECT id, a.f1 FROM import a INNER JOIN
(
SELECT f1 FROM import
WHERE f1 IS NOT NULL AND f1 != ''
GROUP BY f1
HAVING COUNT(id) > 1
) b
ON a.f1 = b.f1
The problem is the outer query to do the update. Here's the whole shebang:
UPDATE import SET method = 'ERR_DUPLICATE' WHERE import.id IN
(
SELECT id FROM
(
SELECT id, a.f1 FROM import a INNER JOIN
(
SELECT f1 FROM import
WHERE f1 IS NOT NULL AND f1 != ''
GROUP BY f1
HAVING COUNT(id) > 1
) b
ON a.f1 = b.f1
) c
)
That construction is from MySQL: You can't specify target table 'tasks' for update in FROM clause - which is the error I got before. The above query works but takes 0.5 sec. for a 20,000 row table with about 30 duplicates. I'll have to deal with much bigger import tables, so this is a show stopper.
Any ideas how to speed this up?