This is a second post to my original question posted here.
My setup:
amazon RDS
using MySQL Workbench
with connection timeout set to max
I am trying to DELETE duplicate rows from my dB which has close to 1MIL rows.
the table looks like this, mytext is a mediumtext
blob. id
is AUTO_INCREMENT
+---+-----+-----+------+-------+
|id |fname|lname|mytext|morevar|
|---|-----|-----|------|-------|
| 1 | joe | min | abc | 123 |
| 2 | joe | min | abc | 123 |
| 3 | mar | kam | def | 789 |
| 4 | kel | smi | ghi | 456 |
+------------------------------+
I would like to end up with a table like this
+---+-----+-----+------+-------+
|id |fname|lname|mytext|morevar|
|---|-----|-----|------|-------|
| 1 | joe | min | abc | 123 |
| 3 | mar | kam | def | 789 |
| 4 | kel | smi | ghi | 456 |
+------------------------------+
This solution started woking but after about 10,000 rows the process takes longer and eventualy hangs.
I let this run for over 20 hours, settings at 10 thousand rows with a WHERE condition (i thought deleting in chunks would be safer).
But even with the WHERE clause the system hangs then I have to Reboot RDS
to access the dB.
DELETE
FROM yourTable
WHERE id>40000
AND id<=50000
AND id NOT IN
(
SELECT MAXID FROM
(
SELECT MAX(id) as MAXID
FROM yourTable
GROUP BY mytext
) as temp_table
)
heres the create statement
CREATE TABLE `yourTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fname` varchar(45) DEFAULT NULL,
`lname` varchar(45) DEFAULT NULL,
`mytext` mediumtext,
`morevar` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1$$
Question
Is this sql command ok for handeling large amounts of rows and what I am trying to achieve? Or is there a better solution.
How long would it normally take to process 1MIL rows?
Is there a setting like in php.ini
inside amazon for large data set manipulation?
Or would it make more sense to create a new table and insert all rows excluding duplicates?