3

We hold a DB of ~350k email addresses for a client.

They have supplied me with a couple of excel spreadsheets with a combined ~150k email addresses they want to remove (long story short, it was a spammy email list that they had gotten hold of and want to remove the bounce backs).

Using the wonders of find and replace, I have got a query built to perform this huge delete but the query takes too long and times out (no surprise there).

The query I have built is in the form of:

DELETE FROM emaillist
WHERE intListID = 68
AND
(strEmailAddress = "aaaa@aaaa.com"
OR strEmailAddress = "aaab@aaaa.com"
OR strEmailAddress = "aaac@aaaa.com"
......
etc etc etc
......
OR strEmailAddress = "zzzy@zzzz.com"
OR strEmailAddress = "zzzz@zzzz.com"
)

Is there a better way to do this?

Fraser
  • 14,036
  • 22
  • 73
  • 118

3 Answers3

10

Put the emails you want to delete in a temporary table. Then you can remove the emails from your original table by using a join

DELETE e 
FROM emaillist e
INNER JOIN temp_table t ON t.strEmailAddress = e.strEmailAddress
WHERE intListID = 68

After that drop the temp table and your done.

juergen d
  • 201,996
  • 37
  • 293
  • 362
2

I would suggest the following steps:

  1. Load the Excel emails into a "todelete"
  2. Build an index on the email field in the "todelete" table
  3. Create a new table with the values you want:

    create table newemails as select * from emaillist where not exists (select 1 from todelete where todelete.email = emaillist.email);

  4. Truncate emaillist

  5. Insert the list back in

The said, you might find that steps (1) and (2) are sufficient, with the query:

delete from emaillist
    where not exists (select 1 from todelete where todelete.email = emaillist.email);

However, deleting 150,000 rows generates a lot of traffic on the log. Truncation and re-insertion saves this overhead.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

You could import the list in a temporary table and try something like:

DELETE FROM emaillist
WHERE intListID = 68
AND strEmailAddress in (Select emailAddress From temporaryTable)

An index on strEmailAddress (even created for the duration of the query) could help a lot.

Tasos P.
  • 3,994
  • 2
  • 21
  • 41