I have one huge table that I would like to make smaller. It has ~230 Million rows.
Both columns are indexed. The structure is:
+--------------+------------+
| id_my_value | id_ref |
+--------------+------------+
| YYYY | XXXX |
+--------------+------------+
I would have to remove the values that have a particular "id_ref" value. I have tried the following:
sql = f"SELECT id_ref FROM REFS"
cursor.execute(sql)
refs = cursor.fetchall()
limit = 1000
for current in refs:
id = current["id_ref"]
sql = f"DELETE FROM MY_VALUES WHERE id_ref = {id} LIMIT {limit}"
while True:
cursor.execute(sql)
mydb.commit()
if cursor.rowcount == 0:
break
Regardless the value I set to "limit" the query is tremendously slow:
DELETE FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10;
I have also tried the other way around. Select the id_value associated with a particular id_ref, and delete:
SELECT id_value FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10
DELETE FROM MY_VALUES WHERE id_value = YYYY
Here is my EXPLAIN.
EXPLAIN DELETE FROM MY_VALUES WHERE id_ref = YYYY LIMIT 1000;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+------------+---------+-------+----------+----------+-------------+
| 1 | DELETE | MY_VALUES | NULL | range | id_ref | id_ref | 5 | const | 20647922 | 100.00 | Using where |
It does use the right INDEX.
I would not have any problem to have this operation running for days ont he server.
- What is the right way to approach this "cleaning"?
EDIT
Here is the output from SHOW CREATE TABLE MY_VALUES
MY_VALUES | CREATE TABLE `MY_VALUES` (
`id_my_value` int NOT NULL AUTO_INCREMENT,
`id_document` int NOT NULL,
`id_ref` int DEFAULT NULL,
`value` mediumtext CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`weigth` int DEFAULT NULL,
`id_analysis` int DEFAULT NULL,
`url` text CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`domain` varchar(64) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`filetype` varchar(16) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`id_domain` int DEFAULT NULL,
`id_city` int DEFAULT NULL,
`city_name` varchar(32) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`is_hidden` tinyint NOT NULL DEFAULT '0',
`id_company` int DEFAULT NULL,
`is_hidden_by_user` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_my_value`),
KEY `id_ref` (`id_ref`),
KEY `id_document` (`id_document`),
KEY `id_analysis` (`id_analysis`),
KEY `weigth` (`weigth`),
KEY `id_domain` (`id_domain`),
KEY `id_city` (`id_city`),
KEY `id_company` (`id_company`),
KEY `value` (`value`(15))
UPDATE
I just tried to remove one register:
DELETE FROM MY_VALUES WHERE id_MY_VALUE = 8
That operation takes "forever". To prevent a timeout, I followed this SO question ,so I have set:
show variables like 'innodb_lock_wait_timeout';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| innodb_lock_wait_timeout | 100000 |
+--------------------------+--------+