-2

I have this table in SQL where I would like to delete duplicate values if paidDate or AppDate are not populated.

With the sample below, I would only like to delete the first John row, because neither paidDate nor AppDate are not populated for that row, but I would like to keep Lisa in this table because it is not a duplicate.

Is there a way to do this?

enter image description here

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • Does this answer your question? [How to delete duplicate rows in SQL Server?](https://stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server) – Thom A Sep 13 '22 at 19:16
  • 5
    Posting images of technical information like sample data, instead of formatted text, is the fastest way I've seen here other than spam to have a question downvoted and then closed or ignored. You will be much more likely to get a good, prompt answer if you **edit the question** to replace the image with well-formatted text for the sample data. – Joel Coehoorn Sep 13 '22 at 19:17
  • I'm not sure what you mean by "Row = 1". The "row 1" is your header . – Thom A Sep 13 '22 at 19:18
  • @Larnu It's confusing, but I think my edit clears it up. And if I got it wrong, the OP is free to correct it. Play silly games (ask a confusing question), win silly prizes. – Joel Coehoorn Sep 13 '22 at 19:20
  • No @Larnu that does not my question by Row = 1. I meant Row =2 where AppDate and PaidDate is NULL If it works right Row =2 will be deleted and Row = 5 will stay because it does not have duplicates – Tanpreet Singh Sep 13 '22 at 19:26
  • What happens if 2 (or more) rows both have `NULL` for both columns? – Thom A Sep 13 '22 at 19:30
  • Then I would only like to keep 1 of those of 2 rows @Larnu In real I will be using ClaimNumber instead of FullName to ensure I have unique values – Tanpreet Singh Sep 13 '22 at 19:45
  • Sample data and expected results (showing all edge cases) *as text* not images would help immensely – Charlieface Sep 13 '22 at 21:33

2 Answers2

2

Just an alternative approach, using a CTE and a windowed functions:

WITH CTE AS(
    SELECT FullName,
            PaidDate,
            AppDate,
            COUNT(CASE WHEN PaidDate IS NOT NULL OR AppDate IS NOT NULL THEN 1 END) AS C
    FROM dbo.YourTable)
DELETE FROM CTE
WHERE C > 0
  AND PaidDate IS NULL
  AND AppDate IS NULL;
Thom A
  • 88,727
  • 11
  • 45
  • 75
0
DELETE m1
FROM MyTable m1
LEFT JOIN MyTable m2 ON m2.FullName = m1.FullName 
    AND (m2.PaidDate IS NOT NULL or m2.AppDate IS NOT NULL)
WHERE m1.PaidDate IS NULL AND m1.AppDate IS NULL
    AND m2.FullName IS NOT NULL

or

DELETE FROM MyTable m1
WHERE m1.PaidDate IS NULL AND m1.AppDate IS NULL
    AND NOT EXISTS(
       SELECT 1 
       FROM MyTable
       WHERE FullName = m1.FullName AND (AppDate IS NOT NULL OR PaidDate IS NOT NULL)
    )

The second NOT EXISTS approach tends to run a little faster, but my own history makes me much faster at writing/maintaining the first exclusion join option.

I really hate using FullName as a key like this. It's rare for a dateset to ensure values like full name are distinct between entities. It's very common to find legitimate duplicate names.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794