1

i have two tables that are joined on a field rowid

i need to delete rows from calibration that match a specific condition based on a different table

there are approximately 50000 rows in batchinfo and 3 million rows in calibration

here is my sql statement:

delete from calibration where rowid in (

select calibration.rowid from batchinfo 
join calibration on batchinfo.rowid=calibration.rowid
where reporttime not like '%2011%')

this is taking hours!!

what can i do to speed this up?

Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062
  • 2
    is `reporttime` a date posing as a string? – Matthew Nov 23 '11 at 18:19
  • 1
    Well, the `LIKE '%2011%'` is not going to be able to use any indices at all - due to the fact you have the leading `%` in there. Maybe try to find to a more suitable WHERE clause to speed things up – marc_s Nov 23 '11 at 18:24
  • The slowness is caused by the fact that your `WHERE` criteria is not sargable. Is there any way to get the rows you want without parsing a string? – Matthew Nov 23 '11 at 18:26
  • 1
    regardless of this `delete` you should fix the `reporttime` column. if `reporttime` contains string and date data, create an additional column of a date or datetime type to store the date and index it. if `reporttime` contains only the date date convert it to a date or datetime type. – KM. Nov 23 '11 at 18:35
  • 1
    @KM. OP may not have control over the table structure. I know I've been stuck in that crappy situation before =/ – Josh Darnell Nov 23 '11 at 21:09

5 Answers5

6
delete c
from batchinfo b
join calibration c
    on batchinfo.rowid=calibration.rowid
where reporttime not like '%2011%'
msmucker0527
  • 5,164
  • 2
  • 22
  • 36
3

Handle it in batches by adding a Top 1000 to your select statment then simply run again and again until there is nothing else to delete.

DELETE FROM calibration 
WHERE rowid IN (SELECT TOP 1000 
                  calibration.rowid 
                FROM batchinfo 
                JOIN calibration ON batchinfo.rowid=calibration.rowid
                WHERE reporttime NOT LIKE '%2011%')
John Hartsock
  • 85,422
  • 23
  • 131
  • 146
  • ...and run it repeatedly until there are no results? – Matthew Nov 23 '11 at 18:20
  • 1
    That's not fixing the poor performance, using a join would be much more effective. – Maess Nov 23 '11 at 18:22
  • 1
    in query analyzer you could just add "GO 10" to the end of the batch and the statement will repeat 10 (or whatever your multiple) times. – EBarr Nov 23 '11 at 18:23
  • @Maess this is dependent upon the relationship between the two tables which was not mentioned. It could be 1 to 1 or 1 to many but it also could be many to many (We just dont know from OPs question). Because of this We can limit the result set within the IN statement by applying TOP. – John Hartsock Nov 23 '11 at 18:24
  • @JohnHartsock True, that's why I didn't ask for a -1. – Maess Nov 23 '11 at 18:39
3

You should simply delete using the JOIN and forget the IN statement.
See this question for details:
T-SQL: Selecting rows to delete via joins

Community
  • 1
  • 1
Matthew
  • 10,244
  • 5
  • 49
  • 104
2

Why do you need the join? Won't this work?

delete from calibration where rowid in (
select batchinfo.rowid from batchinfo 
where reporttime not like '%2011%')

Or, if reporttime is part of calibration:

delete from calibration where rowid in (
select batchinfo.rowid from batchinfo ) and reporttime not like '%2011%'
Jake Feasel
  • 16,785
  • 5
  • 53
  • 66
1

Could you use WHERE EXISTS instead?

DELETE 
FROM calibration 
WHERE EXISTS 
    (SELECT batchinfo.rowid 
     FROM batchinfo  
     WHERE 
         batchinfo.rowid = calibaration.rowid AND
         reporttime NOT LIKE '%2011%')

Note: This is just another option, it looks like the JOINs above should work for you. But, you know...variety is the spice of life =)

Josh Darnell
  • 11,304
  • 9
  • 38
  • 66