2

(I found the same question exists, but was not happy with the detailed specification, so came here for help, forgive me for my ignorance)

DELETE FROM supportrequestresponse # ~3 million records
WHERE SupportRequestID NOT IN (
  SELECT SR.SupportRequestID
  FROM supportrequest AS SR # ~1 million records
)

Or

DELETE SRR
FROM supportrequestresponse AS SRR # ~3 million records
LEFT JOIN supportrequest AS SR
  ON SR.SupportRequestID = SRR.SupportRequestID # ~1 million records
WHERE SR.SupportRequestID IS NULL

Specifics

  • Database: MySQL
  • SR.SupportRequestID is INTEGER PRIMARY KEY
  • SRR.SupportRequestID is INTEGER INDEX
  • SR.SupportRequestID & SRR.SupportRequestID are not in FOREIGN KEY relation
  • Both tables contain TEXT columns for subject and message
  • Both tables are InnoDB

Motive: I am planning to use this with a periodic clean up job, likely to be once an hour or every two hours. It is very important to avoid lengthy operation in order to avoid table locks as this is a very busy database and am already over quota with deadlocks!

EXPLAIN query 1

1   PRIMARY supportrequestresponse  ALL                 410 Using where
2   DEPENDENT SUBQUERY  SR  unique_subquery PRIMARY PRIMARY 4   func    1   Using index

EXPLAIN query 2

1   SIMPLE  SRR ALL                 410 
1   SIMPLE  SR  eq_ref  PRIMARY PRIMARY 4   SRR.SupportRequestID    1   Using where; Using index; Not exists

RUN #2

EXPLAIN query 1

1   PRIMARY supportrequestresponse  ALL                 157209473   Using where
2   DEPENDENT SUBQUERY  SR  unique_subquery PRIMARY PRIMARY 4   func    1   Using index; Using where; Full scan on NULL key

EXPLAIN query 2

1   SIMPLE  SRR ALL                 157209476   
1   SIMPLE  SR  eq_ref  PRIMARY PRIMARY 4   SRR.SupportRequestID    1   Using where; Using index; Not exists
Broken Arrow
  • 576
  • 3
  • 12
  • @KenWhite I just updated the question with text form of both the query EXPLAIN. – Broken Arrow Jul 18 '21 at 01:05
  • @KenWhite Thanks for the insight. I forgot to mention that the EXPLAIN was ran on a very tiny subset of the data containing really x410 records for SRR table. Does this information changes the scene anyway? – Broken Arrow Jul 18 '21 at 01:11
  • It might. The EXPLAIN only evaluates what will happen when the statement is executed; it doesn't do anything to the data. Run it again against the **actual data** so that you get meaningful information back. – Ken White Jul 18 '21 at 01:12
  • @KenWhite Just updated with EXPLAIN query Run #2, any difference would it make? – Broken Arrow Jul 18 '21 at 01:26
  • For future reference, you might want to add the version of MySQL that is involved. – Luuk Jul 18 '21 at 08:29
  • Bothe question are not efficient (based on "Time > CPU > Memory) because of this question on SO. Too much time is used by too many people. 1) Create a backup, 2) Try query1, 3) Restore backup, 4) Try query2, 5) Learn from the results. – Luuk Jul 18 '21 at 08:32
  • have you tried to 1 - create a new table with the wished ids, 2 - then run the delete, based on an inner join on this table ? (3 - then truncate this 'intermediate' table for next time) – St3an Jul 18 '21 at 08:48
  • I don't find it sounds nice creating tables, then populating with outcome, while there might be no change at all, or there might might be a deletion anywhere in the table by row order, and finally replacing the old one with the new one, and all that once every hour or two hours! – Broken Arrow Jul 18 '21 at 13:25
  • @Luuk That is going far beyond the scope for a clean up job for a cron :) – Broken Arrow Jul 18 '21 at 13:27
  • What are you actually trying to do? Question that is unclear (to me) How any records will be deleted (on average)? Because `supportrequestresponse` may have 3 million records, and `supportrequest` but how many records are actually deleted (if you do this every hour)? – Luuk Jul 18 '21 at 15:58
  • @Luuk That is the UNDEFINED scene here, it can be NO MATCHING rows to delete, or there can be some eligible rows to delete. Imagine the purpose like cleaning up the system against ORPHAN/PARENTLESS records, if any. And the actual data contains more than 1500ml real time records, that is where the question of efficiency came in tough. – Broken Arrow Jul 19 '21 at 13:03
  • 1
    This would mean that you could select these records, and copy (the unique id) into a memory table, which should not cause any locking (see: https://stackoverflow.com/questions/917640/any-way-to-select-without-causing-locking-in-mysql). After this you can delete the found rows in an efficient way. – Luuk Jul 19 '21 at 17:25
  • @Luuk I think this is a good idea :) Except for one bottleneck though, when there is a record to delete is physically placed close to bottom in order the MySQL makes the search, or lets just say there are NO eligible records to delete; MySQL basically needs to SCAN the entire huge table in that case before giving up with NO RECORD FOUND decision, right? But still, your idea makes sense. – Broken Arrow Jul 19 '21 at 21:05

2 Answers2

0

I suspect it would be quicker to create a new table, retaining just the rows you wish to keep. Then drop the old table. Then rename the new table.

  • I'm afraid this would make the table unavailable (thus non-existent) if one does that... but it gave me an idea (see my comment on OP) – St3an Jul 18 '21 at 08:45
  • Sounds a bit difficult with an automated cron job per hour with 3ml+ data while intention is the release table locks as soon as possible as other operations are knocking at the door of query pool. – Broken Arrow Jul 18 '21 at 13:30
0

I don't know how to describe this, but this worked as an answer to my case; an unbelievable one!

DELETE SRR
FROM supportrequestresponse AS SRR

LEFT JOIN (
    SELECT SRR3.SupportRequestResponseID
    FROM supportrequestresponse AS SRR3
    LEFT JOIN supportrequest AS SR ON SR.SupportRequestID = SRR3.SupportRequestID
    WHERE SR.SupportRequestID IS NULL
    LIMIT 999
) AS SRR2 ON SRR2.SupportRequestResponseID = SRR.SupportRequestResponseID

WHERE SRR2.SupportRequestResponseID IS NOT NULL;

... # Same piece of SQL
... # Same piece of SQL
... #999 Same piece of SQL

A fork of the second pattern looks/feels appropriate than having to let MySQL match each row against a dynamic list, but this is the minor fact. I just limited the row selection to 999 rows at once only, that lets the DELETE operation finish in a blink of eye, but most importantly, I repeated the same piece of DELETE SQL 99 times one after another!

This basically made it super comfortable for a Cron job. The x99 statements let the database engine keep the tables NOT LOCKED so other processes don't get stuck waiting for the DELETE to finish, while each x# DELETE SQL takes very less amount of time to finish. I find it something like when vehicles pass through cross roads in a zipper fashion.

Broken Arrow
  • 576
  • 3
  • 12