I've inherited a SQL Server database that has duplicate data in it. I need to find and remove the duplicate rows. But without an id field, I'm not sure how to find the rows.
Normally, I'd compare it with itself using a LEFT JOIN
and check that all fields are the same except the ID field would be table1.id <> table2.id
, but without that, I don't know how to find duplicates rows and not have it also match on itself.
TABLE:
productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null
SAMPLE DATA
1, 3, "started", "2016-06-15 04:23:12.000"
2, 3, "started", "2016-06-15 04:21:12.000"
1, 3, "started", "2016-06-15 04:23:12.000"
1, 3, "done", "2016-06-15 04:23:12.000"
In that sample, only rows 1 and 3 are duplicates.
How do I find duplicates?