0

I am having an issue removing duplicated rows. I've tried using ORDER BY to remove duplicates but it needs to be more sophisticated than that and I am not sure how to do that. I have some duplicate rows that I would want to remove the one where the account number is null but what is making it complicated is that there are some rows that there is no account number at all and I want to keep those.

For example:

       AppID | First Name | Last Name |   SSN  | Account Number
Keep:  1021    John         Doe        ********  124989
Remove 1021    John         Doe        ********  NULL
Keep:  1287    Mary         Smith      ********  NULL
  • Which [DBMS](https://en.wikipedia.org/wiki/DBMS) product are you using? "SQL" is just a query language, not the name of a specific database product. Please add a [tag](https://stackoverflow.com/help/tagging) for the database product you are using `postgresql`, `oracle`, `sql-server`, `db2`, ... –  Oct 08 '19 at 14:03
  • Also, this is a common problem. In the Stack Overflow search bar, try "sql delete duplicate rows". I did so just now, and got 1037 results, and I all but guarantee that one of the first five will provide the info you need. (Plus, this is a great way to learn additional stuff you'd never think to ask!) – Philip Kelley Oct 08 '19 at 14:07
  • `ORDER BY` doesn't add or remove any results, it just changes the order in which they're displayed. What you need is a tweak or an addition to your `WHERE` clause. Something along the lines of `WHERE (AppID IS NOT NULL AND Account Number IS NOT NULL)` – Jay Oct 08 '19 at 14:14

3 Answers3

0

How to delete duplicate rows in SQL Server?

It should help you resolve your problem. For next time you should search StackOverFlow more careful.

Robert Daraż
  • 323
  • 2
  • 19
0
select * from (
select data.*, 
---------you can add colums after partition by part. and decide which rows is needed and take them
row_number() over(partition by appid,first_name,last_name,ssn order by case when trim (account_number)= '' then 0 else 1 end desc)  rn
-----
from  (
select 1021 appid,'john' first_name,'doe' last_name, '****' ssn,124989 account_number from dual
union all
select 1021 appid,'john' first_name,'doe' last_name, '****' ssn,null account_number from dual
union all
select 1287 appid,'mary' first_name,'smith' last_name, '****' ssn,null account_number from dual
)data
)where rn=1
  • Welcome to stackoverflow. In addition to the answer you've provided, please consider providing a brief explanation of why and how this fixes the issue. – jtate Oct 08 '19 at 14:45
0

One method is row_number(). If you want one row for all combos of the first five columns:

select t.*
from (select t.*,
             row_number() over (partition by AppID, First Name, Last, Name, SSN
                                order by (case when account_number is not null then 1 else 2 end)
                               ) as seqnum
      from t
     ) t
where seqnum = 1;

Use rank() if you want to keep all rows that have valid accounts -- although you might need additional logic if you can have multiple rows with NULL accounts.

However, aggregation might also do what you want:

select AppID, First Name, Last, Name, SSN, max(account_number)
from t
group by AppID, First Name, Last, Name, SSN;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786