7

I have a table called "downloads" with two foreign key columns -- "user_id" and "item_id". I need to select all rows from that table and remove the rows where the User or the Item in question no longer exists. (Look up the User and if it's not found, delete the row in "downloads", then look up the Item and if it's not found, delete the row in "downloads").

It's 3.4 million rows, so all my scripted solutions have been taking 6+ hours. I'm hoping there's a faster, SQL-only way to do this?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
jmccartie
  • 4,956
  • 8
  • 50
  • 71

5 Answers5

30

use two anti joins and or them together:

delete from your_table
where user_id not in (select id from users_table)
or item_id not in (select id from items_table)

once that's done, consider adding two foreign keys, each with an on delete cascade clause. it'll do this for you automatically.

Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • 2
    +1 for suggesting foreign keys to solve the problem not the sympton –  May 14 '11 at 15:42
3
delete from your_table where user_id not in (select id from users_table) or item_id not in (select id from items_table)
Adrian Serafin
  • 7,665
  • 5
  • 46
  • 67
0

For future reference. For these kind of long operations. It is possible to optimise the server independently of the SQL. For example detach the sql service, defrag the system disk, if you can ensure the sql log files are on separate disk drive to the drive where database is. This will at least reduce the pain of these kind of long operations.

Farjad
  • 257
  • 4
  • 9
0

I've found in SQL 2008 R2, if your "in" clause contains a null value (perhaps from a table who has a reference to this key that is nullable), no records will be returned! To correct, just add a clause to your selects in the union part:

delete from SomeTable where Key not in (
  select SomeTableKey from TableB where SomeTableKey is not null
  union
  select SomeTableKey from TableC where SomeTableKey is not null
)
K0D4
  • 2,373
  • 1
  • 27
  • 26
0

think there is no faster solution when there are so many rows that are on your server 157 rows per second

check user id if mysql num rows = 0 than delete the downloads and also check the item_id

there was also a similar question about the performance of myswl num rows

MySQL: Fastest way to count number of rows

edit: think the best is to creatse some triggers so the database server does the job for you

currently i would use a cronjob for the first time

Community
  • 1
  • 1