-1

I loaded some data into a SQL Server table from a .CSV file for test purposes, I don't have any primary key, unique key or auto-generated ID in that table.

Helow is an example of the situation:

select * 
from people
where name in (select name 
               from people
               group by name
               having count(name) > 1)

When I run this query, I get these results:

enter image description here

The goal is to keep one row and remove other duplicate rows.

Is there any way other than save the content somewhere else, delete all duplicate rows and insert a new one?

Thanks for helping!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Chase
  • 195
  • 1
  • 1
  • 10

2 Answers2

2

You could use an updatable CTE for this.

If you want to delete rows that are exact duplicates on the three columns (as shown in your sample data and explained in the question):

with cte as (
    select row_number() over(partition by name, age, gender order by (select null)) rn
    from people
)
delete from cte where rn > 1

If you want to delete duplicates on name only (as shown in your existing query):

with cte as (
    select row_number() over(partition by name order by (select null)) rn
    from people
)
delete from cte where rn > 1
GMB
  • 216,147
  • 25
  • 84
  • 135
1

How are you defining "duplicate"? Based on your code example, it appears to be by name.

For the deletion, you can use an updatable CTE with row_number():

with todelete as (
      select p.*,
             row_number() over (partition by name order by (select null)) as seqnum
      from people p
     )
delete from todelete
    where seqnum > 1;

If more columns define the duplicate, then adjust the partition by clause.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786