0

So we had a duplicate SQL scripts running on our server and didn't realize it till just recently. Essentially I have many rows where there are 2 entries with the same column x (crn).

The initially got entered with the same column y (status) as well. Our application has users update the column y (status). However now we have 2 rows one with a status of 'S' and one with a status of something other than 'S'. My goal:

DELETE everything from the table WHERE there is a duplicate CRN and the STATUS is S. I don't want to delete rows unless there is a duplicate, but if there is, I only want to delete the row with a status of 'S'. Also, I'd rather not delete both records if both have a status of S, but if I do, that isn't such a big deal because I will get the courses again in the next download.

I have started making a select statement to query the rows I want, but don't know how to do the ONLY SELECT IF DUPLICATE EXISTS part. I feel like I need to UNION or LEFT JOIN or something to only get records if a duplicate exists.

SELECT * FROM 
cas_usuECourses 
WHERE 
crn IN (SELECT crn FROM cas_usuECourses GROUP BY crn having count(1) > 1)
AND status = 'S'
AND termCode = 201320

EDIT: Is there a way to say... the above, but if both dups have 'S' only delete one of them?

EDIT: I "think" this looks good to me. Any thoughts?

SELECT id FROM (
    SELECT id, Row_Number() Over (Partition By crn ORDER BY id DESC) as ranking
    FROM cas_usuECourses
    WHERE status = 'S'
    AND termCode = 201320
) as ranking

WHERE ranking = 1

I think this will give me all the ids where the status is 'S', and if there are two with 'S' this will give me the one that was created second. I found out that every entry in our termCode has duplicates, so... don't need to worry about checking for the duplicates.

Leeish
  • 5,203
  • 2
  • 17
  • 45
  • What have searched/tried? What RDMS? Possible duplicates: [here](http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows), [here](http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows/679449) and [here](http://stackoverflow.com/questions/6471463/how-to-delete-duplicates-in-sql-table-based-on-multiple-fields/679449). – Kermit Oct 26 '12 at 17:36
  • Well, I've been searching for information on duplicates on here, but don't see anything that is like what I need. – Leeish Oct 26 '12 at 17:39
  • The information on here would get you on the right path. – Kermit Oct 26 '12 at 17:40
  • Turbot What is RDBMS? Randy, I don't understand 1 and * return the same number of results. The above gives me almost good data, I just want to only delete one row where the status = 'S' not both. – Leeish Oct 26 '12 at 17:51
  • @Leeish what database are you using? MySQL? SQL Server? Oracle? – Matthew Oct 26 '12 at 17:56

1 Answers1

0

If you could add one column to your table and fill it with distinct values, it would be trivial - you could target each row.

Otherwise, after your initial step, I would generally open a cursor on your subselect with status S to select only crn's where both statuses are 'S', and in each loop iteration delete top 1 record with appropriate crn. That way you can get rid of duplicate crn/status pairs.

OzrenTkalcecKrznaric
  • 5,535
  • 4
  • 34
  • 57
  • @Leeish - cursor is inefficient but sometimes neccessary (at least in SQL Server) way to iterate through database table records. Please see example [here](http://www.mssqltips.com/sqlservertip/1599/sql-server-cursor-example/). Using cursor you could fix this. And don't forget to use a transaction to test your code if you're running it on a production server! – OzrenTkalcecKrznaric Oct 26 '12 at 23:34