0

There was a bug in code that was creating duplicate rows. Here is a sample of two of the columns in my table as it looked before I removed the duplicates. (there is an identity column, ID and other columns that are irrelevant)

SampleID       ProjectID
0              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
0              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
0              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
0              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
1              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
1              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
1              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
1              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
1              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
1              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
0              00e95061-96e2-4db5-999e-5ca629561c50
0              00e95061-96e2-4db5-999e-5ca629561c50
0              00e95061-96e2-4db5-999e-5ca629561c50
1              00e95061-96e2-4db5-999e-5ca629561c50
1              00e95061-96e2-4db5-999e-5ca629561c50
1              00e95061-96e2-4db5-999e-5ca629561c50
1              00e95061-96e2-4db5-999e-5ca629561c50
1              00e95061-96e2-4db5-999e-5ca629561c50

I needed to remove duplicates from the table so that I am left with a table with these rows, and was eventually to accomplish this, but my question is why the query that I originally tried didn't work as expected.

SampleID       ProjectID
0              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
1              c0f5e4cc-7664-4d70-ab3f-6ff832aa8ece
0              00e95061-96e2-4db5-999e-5ca629561c50
1              00e95061-96e2-4db5-999e-5ca629561c50

The query that I tried is

DELETE FROM INS_Sample WHERE ID NOT IN 
(SELECT MIN(ID) FROM INS_Sample GROUP BY SampleID, ProjectID);

and I found it in an answer to this question. When I ran this query, it seemed to running indefinitely so I stopped it and the query had actually been inserting more duplicate rows into my table. I had to run SELECT MIN(ID) FROM INS_Sample GROUP BY SampleID, ProjectID first and make a comma seperated list of the resulting IDs to paste into my original query so that it looked like this

DELETE FROM INS_Sample WHERE ID NOT IN 
(1234, 5678, /*a lot more IDs*/);

before I could get the results I want. Why doesn't my first query work as it should, and how in the world were rows getting inserted into the table? There isn't anything else in my query editor other than a select to see what is in the table.

Community
  • 1
  • 1
Sam W
  • 599
  • 2
  • 16

3 Answers3

0

You are executing DELETE statement and thus it perform DELETE in your table but still if you say so that it does INSERT some record then there can be only one way and that is TRIGGER. Check and see if you have defined trigger (specifically AFTER DELETE trigger) in your INS_Sample table .

Rahul
  • 76,197
  • 13
  • 71
  • 125
  • If there is a `TRIGGER`, I'll have to ask another engineer that isn't in at the moment. It's an Azure DB so I wonder if maybe they have something in place to restore deleted rows? I'm grasping at straws at this point. – Sam W Dec 21 '16 at 20:05
  • @Sam, there would be backup of your table either full / differential backup. You can restore from there hopefully. – Rahul Dec 21 '16 at 20:14
  • I actually don't want to restore the duplicate rows; I needed to delete them. I just can't figure out why the original query I ran was inserting the rows again instead of deleting them. I'm running more data with the bugged software to create more duplicate rows in the table and I'm going to try to reproduce it. – Sam W Dec 21 '16 at 20:16
0

A delete query cannot insert rows into a table . . . well, unless you have a trigger on the table that implements such behavior.

Your query might hang and do nothing to the table, if no ids match. You say that id is an identity, but you don't specify if it is a primary key. If id is ever NULL in the subquery, then the NOT IN will never evaluate to true.

I recommend using something like this instead:

DELETE
    FROM INS_Sample s
WHERE s.id > (SELECT MIN(ID)
              FROM INS_Sample s2
              WHERE s2.SampleId = s.SampleID AND s2.ProjectID = s.ProjectID
             );
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

you can try query like this

;with cte as (
select RowN = row_number() over (partition by sampleid, projectid order by sampleid, projectid) from yourproject
) delete from cte where RowN > 1
Kannan Kandasamy
  • 13,405
  • 3
  • 25
  • 38
  • I've already managed to delete the rows; that wasn't my question. I was asking about why the query I tried first didn't behave as expected. – Sam W Dec 21 '16 at 20:24