Removing Partial Duplicates Ignoring Nulls in SQL

Question

I have a table with two possible unique identifiers (ID1 and ID2). Each row will have one or both of these identifiers. The data in each row is exactly the same for each ID, with the exception of a timestamp. I would like to eliminate duplicates from each value, but treat nulls as unique values.

This question: How to delete duplicate rows in sql server?

Referred me to this site: http://www.codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

where I came up with the following query:

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID1 ORDER BY ID1) AS RN
FROM Filings_Search
)

DELETE FROM CTE WHERE RN<>1

Unfortunately, this deleted all my null values as well! How can I modify this query to avoid deleting null values?

Edit: Here is a sample of what my data would look like (if anyone knows how to format tables well, let me know. I used https://senseful.github.io/text-table/).

+------+------+----------+-----------+
| ID1  | ID2  |   Data   | Timestamp |
+------+------+----------+-----------+
| NULL | abc  | macd     | 01:40     |
| NULL | abc  | macd     | 04:23     |
| NULL | def  | pfchangs | 01:41     |
| 123  | NULL | wendys   | 02:42     |
| 123  | NULL | wendys   | 03:45     |
+------+------+----------+-----------+

Running on ID1 would output:

+------+------+----------+-----------+
| ID1  | ID2  |   Data   | Timestamp |
+------+------+----------+-----------+
| NULL | abc  | macd     | 01:40     |
| NULL | abc  | macd     | 04:23     |
| NULL | def  | pfchangs | 01:41     |
| 123  | NULL | wendys   | 02:42     |
+------+------+----------+-----------+

Running on ID2 would output:

+------+------+----------+-----------+
| ID1  | ID2  |   Data   | Timestamp |
+------+------+----------+-----------+
| NULL | abc  | macd     | 01:40     |
| NULL | def  | pfchangs | 01:41     |
| 123  | NULL | wendys   | 02:42     |
| 123  | NULL | wendys   | 03:45     |
+------+------+----------+-----------+

Apologies if this is a duplicate, I am a SQL beginner and couldn't find anything exactly like what I was looking for.

Show us db schema, sample data, current and expected output. Please read [**How-to-Ask**](http://stackoverflow.com/help/how-to-ask) And here is a great place to [**START**](http://spaghettidba.com/2015/04/24/how-to-post-a-t-sql-question-on-a-public-forum/) to learn how improve your question quality and get better answers. — Juan Carlos Oropeza, Feb 05 '19 at 17:28
What were the NULL values? Right now isnt clear what is the data and what row you want delete and which one you want to keep. So we can't really help you beside guessing. — Juan Carlos Oropeza, Feb 05 '19 at 17:31
To format as `code`, select the text and press `ctrl-k` or add 4 spaces before each line — Juan Carlos Oropeza, Feb 05 '19 at 18:09

score 0 · Accepted Answer · answered Feb 05 '19 at 17:29

0

what about:

 DELETE FROM CTE 
 WHERE RN<>1
   AND ID1 IS NOT NULL

answered Feb 05 '19 at 17:29

Juan Carlos Oropeza

47,252
12
78
118

You should change your `ORDER BY ID1` to `ORDER BY timestamp` – Juan Carlos Oropeza Feb 05 '19 at 18:12
Consider mark this answer as correct with the green tick if help you solve the problem – Juan Carlos Oropeza Feb 05 '19 at 18:12

score 0 · Answer 2 · answered Feb 05 '19 at 17:49

0

use ID2,and Data in partition by

   WITH CTE AS (
        SELECT f.*, ROW_NUMBER() OVER (PARTITION BY  ID2,data ORDER BY Timestamp ) AS RN
        FROM Filings_Search 
    )
    DELETE FROM CTE WHERE RN<>1

answered Feb 05 '19 at 17:49

Zaynul Abadin Tuhin

31,407
5
33
63

Removing Partial Duplicates Ignoring Nulls in SQL

2 Answers2