Over Partition to find duplicates and remove them based on criteria SQL

Question

I hope everyone is doing well. I have a dilemma that i can not quite figure out. I am trying to find a unique value for a field that is not a duplicate.

For example:

Table 1

|Col1  | Col2| Col3 |     
| 123  |  A  |  1   |     
| 123  |  A  |  2   |     
|  12  |  B  |  1   |     
|  12  |  B  |  2   |     
|  12  |  C  |  3   |     
|  12  |  D  |  4   |     
|  1   |  A  |  1   |     
|  2   |  D  |  1   |      
|  3   |  D  |  1   |

Col 1 is the field that would have the duplicate values. Col2 would be the owner of the value in Col 1. Col 3 uses the row number() Over Partition syntax to get the numbers in ascending order.

The goal i am trying to accomplish is to remove the value in col 1 if it is not truly unique when looking at col2.

Example:

Col1 has the value 123, Col2 has the value A. Although there are two instances of 123 being owned by A, i can determine that it is indeed unique.

Now look at Col1 that has the value 12 with values in Col2 of B,C,D.

Value 12 is associated with three different owners thus eliminating 12 from our result list.

So in the end i would like to see a result table such as this :

|Col1  | Col2| 
| 123  |  A  | 
|  1   |  A  |
|  2   |  D  | 
|  3   |  D  |

To summarize, i would like to first use the partition numbers to identify if the value in col1 is repeated. From there i want to verify that the values in col 2 are the same. If so the value in col 1 and col 2 remains as one single entry. However if the values in col 2 do not match, all records for the col1 value are removed.

I will provide the syntax code for my query if needed.

Update**

I failed to mention that table 1 is the result of inner joining two tables.

So Col1 comes from table a and Col2 comes from table b.

The values in table a for col2 are hard to interpret so i had to make sense of them and assigned it proper name values.

The join query i used to combine the two are:

Select a.Col1, B.Col2 FROM Table a INNER JOIN Table b on a.Colx = b.Colx

Update**

Table a:

|Col1  | Colx| Col3 |     
| 123  | SMS |  1   |     
| 123  | S9W |  2   |     
|  12  | NAV |  1   |     
|  12  | NFR |  2   |     
|  12  | ABC |  3   |     
|  12  | DEF |  4   |     
|  1   | SMS |  1   |     
|  2   | DEF |  1   |      
|  3   | DES |  1   |

Table b:

|Colx  | Col2| 
| SMS  |  A  | 
| S9W  |  A  |
| DEF  |  D  | 
| DES  |  D  |
| NAV  |  B  |
| NFR  |  B  |
| ABC  |  C  |

Above are sample data for both tables that get joined in order to create the first table displayed in this body.

Thank you all so much!

....If you have duplicates in the joined result set, almost always the proper way to do things is to remove duplicates at or before the join point. Please, show us the data in the origin tables - my bet is that you can dedup your `tableB` (probably based on whatever `colX` really is). Also, do you want to only _display_ the records, or do you need to `DELETE` the duplicates? — Clockwork-Muse, Dec 05 '18 at 19:39
@Clockwork-Muse i have updated my question to reflect the two tables that are joined. My goal is to find the unique value in which col 1 is only present in 1 unique value of col 2. so looking at my example Value 123 from col 1 has two instances with values in col 2 being A so it passes the test and it should return the value of 123 in my results. However value 12 from col 1 has 3 distinct values in col 2 (B,C,D) which means value 12 shares the same value with three different owners. I would like to remove any instances of that. — CCP, Dec 05 '18 at 20:42
Argh. No, you definitely need the join there. @GordonLinoff is right that `col3` is irrelevant to the query, though. — Clockwork-Muse, Dec 05 '18 at 21:19
Possible duplicate of [Finding duplicate values in a SQL table](https://stackoverflow.com/questions/2594829/finding-duplicate-values-in-a-sql-table) — Clockwork-Muse, Dec 05 '18 at 21:25
@Clockwork-Muse i ended up using a variation of GordonLinoff's solution. Thank you for providing the link. — CCP, Dec 12 '18 at 18:37

score 2 · Answer 1 · answered Dec 05 '18 at 17:03

2

NOT EXISTS operator can be used to do this task:

SELECT distinct Col1 ,  Col2
FROM table t
WHERE NOT EXISTS(
  SELECT 1 FROM table t1
  WHERE t.col1=t1.col1 AND t.col2 <> t1.col2
)

answered Dec 05 '18 at 17:03

krokodilko

35,300
7
55
79

I have attempted this but i am getting a query timeout. I have provided an update for information that i left out that i think is important. – CCP Dec 05 '18 at 18:49

score 0 · Accepted Answer · answered Dec 05 '18 at 17:01

0

If I understand correctly, you want:

select col1, min(col2)
from t
group by col1
where min(col2) <> max(col2);

I think the third column is confusing you. It doesn't seem to play any role in the logic you want.

answered Dec 05 '18 at 17:01

Gordon Linoff

1,242,037
58
646
786

Col 3 is the rownumber partition portion of the syntax. Col2 is actually another column in a different table (b) that i use to join with the table(a) for Col1. so table a has the owners in a different format and table b converts them to understandable values thus why some values from col 1 will populate twice when the value in col 2 are identical (123 example). As soon as my db2 application turns on i will attempt this syntax and let you know the results – CCP Dec 05 '18 at 17:14
I have updated my question to reflect the syntax structure i use to get Col2 – CCP Dec 05 '18 at 18:50
i ended up using a variation of this syntax and it did the job. Thank you so much for the help – CCP Dec 12 '18 at 18:38

Over Partition to find duplicates and remove them based on criteria SQL

2 Answers2