0

My sql table is have not primary key column. I can find dublicate rows but i dont know how i can remove to one to rest; Let me explain;

col1     col2    col3    col4
10       0       1000    1    
10       0       1000    1    --> should be deleted
10       0       1111    2    --> should be deleted
10       1       1000    1
10       2       1000    1
15       0       1000    1
15       0       1000    1    --> should be deleted
16       0       1000    1

I use col1 and col2 for understand row is dublicate. Same time "10" "0" must be unique but tables can contains many "10" or many "0" value.

Thanks.

user2864740
  • 60,010
  • 15
  • 145
  • 220
Black White
  • 700
  • 3
  • 11
  • 31
  • My first thought: Add an auto-increment id and use that one to delete the duplicates. – juergen d Mar 11 '14 at 19:56
  • What SQL are you using ? (MySQL, PostgreSQL, MSSQL, Oracle, ...) – deroby Mar 11 '14 at 19:57
  • @user2672165 Actually i search in net before ask it. But many examples have one different value in columns. My dublicates are totally same sometimes. I must match 1st and 2nd columns.. – Black White Mar 11 '14 at 20:02
  • Ok i can investigate these topics and inform you. – Black White Mar 11 '14 at 20:08
  • 2
    The referenced questions all have an 'extra' id column that seems to be lacking here making their solutions not applicable. – deroby Mar 11 '14 at 20:12
  • @deroby +1 Thanks for the answer i can try it. – Black White Mar 11 '14 at 20:13
  • Hmm.. just realized I went over the top and assumed you wanted all 4 columns to identify uniqueness. I'll adapt so it only looks at col1 and col2, but the question then becomes: how do you know you want the 2nd and 3rd row removed and not the 1st one ? Or doesn't it matter as long as you only keep 1 of the three, whichever that may be. – deroby Mar 11 '14 at 20:16
  • @deroby col3 and col4 can be different or can be same not a problem. But col1="10" and col2="0" combination can only be one. Or col1="11" and col2="0" etc. – Black White Mar 11 '14 at 20:21

2 Answers2

2

This should work. It first finds out how many duplicates there are per combination and then deletes them all but one.

CREATE TABLE t_test (col1 int, col2 int, col3 int, col4 int)

INSERT t_test 
          SELECT 10, 0, 1000, 1 
UNION ALL SELECT 10, 0, 1000, 1 --> should be deleted
UNION ALL SELECT 10, 0, 1111, 2 --> should be deleted
UNION ALL SELECT 10, 1, 1000, 1
UNION ALL SELECT 10, 2, 1000, 1
UNION ALL SELECT 15, 0, 1000, 1
UNION ALL SELECT 15, 0, 1000, 1 --> should be deleted
UNION ALL SELECT 16, 0, 1000, 1

DECLARE @col1 int, @col2 int, @count int

DECLARE delete_loop CURSOR LOCAL STATIC
    FOR SELECT COUNT(*), col1, col2
          FROM t_test
         GROUP BY col1, col2
        HAVING COUNT(*) > 1
OPEN delete_loop
FETCH NEXT FROM delete_loop INTO @count, @col1, @col2
WHILE @@FETCH_STATUS = 0
    BEGIN
        DELETE TOP (@count - 1)
          FROM t_test
         WHERE col1 = @col1
           AND col2 = @col2

        FETCH NEXT FROM delete_loop INTO @count, @col1, @col2
    END
CLOSE delete_loop
DEALLOCATE delete_loop

SELECT * FROM t_test

EDIT: Adapted to only look at col1 and col2 for uniqueness.

deroby
  • 5,902
  • 2
  • 19
  • 33
  • Succesfully working. Thanks for the solution @deroby – Black White Mar 11 '14 at 20:32
  • You're welcome. If it would be too slow you can first add an index on col1 & col2, run the fix and them remove the index again. That is, if there is no such an index yet off course. – deroby Mar 11 '14 at 20:45
0

Here is an easy method to identify duplicates, and delete them.

Add an id that increments for each combination of col1 & col2 (partition by), wrap that up in a CTE, and delete the records that are not equal to 1 (the first occurrence).

DECLARE @Test TABLE (col1 int, col2 int, col3 int, col4 int)

INSERT @Test 
          SELECT 10, 0, 1000, 1 
UNION ALL SELECT 10, 0, 1000, 1 --> should be deleted
UNION ALL SELECT 10, 0, 1111, 2 --> should be deleted
UNION ALL SELECT 10, 1, 1000, 1
UNION ALL SELECT 10, 2, 1000, 1
UNION ALL SELECT 15, 0, 1000, 1
UNION ALL SELECT 15, 0, 1000, 1 --> should be deleted
UNION ALL SELECT 16, 0, 1000, 1

;WITH DUPES
AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY COL1,COL2 ORDER BY COL1,COL4) AS myID
FROM @Test
)

DELETE D
FROM DUPES D
WHERE myID <> 1

SELECT * 
FROM @Test
Dave C
  • 7,272
  • 1
  • 19
  • 30