Using Subqueries to remove duplicate IDs

Question

I have 2 Tables. Table 1 holds ID1 and ID2. Table 2 holds ID2 and ID3.

Table 1 has unique cases for ID1 and multiple cases for ID2.

Table 2 has unique cases for ID2 and multiple cases for ID3

I want 1 unique case of ID3.

I need remove duplicate ID2s from Table 1 picking to remove the duplicate ID2s based on the smaller ID1

So Table 1 now looks like:

Now I want to go to Table 2 and remove any duplicate ID3s based on the smaller ID2

So my end result should be (I am joining the tables because both of them have other relevant information I need to combine but these are the IDs I am sorting and filtering to get the correct row):

Final Table:
ID1 | ID2 | ID3
2     2     1
7     5     2
5     4     3
10    6     4
9     7     5

Where now I have a single case for each ID3 based on the largest ID1 and ID2 associated with that ID3.

I have tried creating subqueries in the WHERE function to remove the duplicates but my understanding of SQL is not good enough to really figure out what is happening.

Group By and DISTINCT does not work for this case.

Decision Tree

I added a Decision Tree to help visualize the problem. Essentially, each ID3 can potentially have multiple ID2s, which can potentially have multiple ID1s. I want to keep only the largest ID1, which gives me the correct ID2 associated with that ID3.

This question is asked almost daily and has numerous solutions on Stack Overflow, depending on your [RDBMS](https://meta.stackoverflow.com/questions/388759/why-should-i-tag-my-rdbms). Have you searched for similar questions? — Stu, Nov 03 '22 at 14:20
Yes, my case is typically more complicated than those. The solutions are normally just a group by or distinct. Neither of which work for me. — RIzz, Nov 03 '22 at 14:32
I am pulling my data from an SQL Server and coding in Microsoft SQL Server Managment Studio. — RIzz, Nov 03 '22 at 14:36
`SELECT t1.id1, t1.id2, t2.id3 FROM (SELECT id1, min(id2) as id2 FROM table1 GROUP BY id1) t1 INNER JOIN (SELECT id2, min(id3) as id3 FROM table2 GROUP BY id2) t2 ON t1.id2 = t2.id2;` The answer is still to GROUP BY. You can use two subqueries and then join the results. — JNevill, Nov 03 '22 at 14:38
I cannot use Group By because I am moving this into Power BI DirectQuery and Group By is not allowed in DQ — RIzz, Nov 03 '22 at 14:51
https://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows?rq=1 I am reading this article to try and find a solution but I am having translating it specifically to my code/problem due to my lack of knowledge on SQL — RIzz, Nov 03 '22 at 14:52
Then your question should also be tagged with _something_ involving `Power BI DirectQuery` because the problem you're actually having is a limitation in that technology that has little/nothing to do with SQL Server, T-SQL, or the many duplicates. — Stuck at 1337, Nov 03 '22 at 14:56
Also if PowerBI has limitations in what syntax it can use to query SQL Server, can't you put SQL Server syntax into a view and just point PowerBI at a view? If a tool only supports a subset of the dialect then you can't expect the tool to solve complex problems. — Stuck at 1337, Nov 03 '22 at 15:02

score 0 · Accepted Answer · answered Nov 03 '22 at 15:41

0

with t1 as  (
select  ID1, ID2
from
(
select  *
       ,row_number() over(partition by ID2 order by ID1 desc) as rn
from    t 
) t
where rn = 1
            ),
t3 as       (
select  ID2, ID3
from
(
select  *
       ,row_number() over(partition by ID3 order by ID2 desc) as rn
from    t2
) t
where   rn = 1
            )

select   t1.ID1
        ,t1.ID2
        ,t3.ID3
from     t1 join t3 on t3.ID2 = t1.ID2
order by ID3

ID1	ID2	ID3
2	2	1
7	5	2
5	4	3
10	6	4
9	7	5

Fiddle

answered Nov 03 '22 at 15:41

DannySlor

4,574
1
3
11

This worked perfectly and I understand how this solution works now, I saw a few similar but never really understood them as much. Thank you. – RIzz Nov 03 '22 at 17:42
You're welcome. Please chose the answer and vote it up. – DannySlor Nov 03 '22 at 18:30
Is there a way to only use ID2's that exist in Table 1? Apparently Table 2 has ID2s that do not exist in Table 1. Like for example Table 2 might have ID2 | ID3 1 1 11 1 Table 1 has: ID1 | ID2 1 1 2 1 So my final table should be ID1 | ID2 | ID3 2 1 1 – RIzz Nov 03 '22 at 19:11
I edited your Fiddle to demonstrate. https://dbfiddle.uk/uSfmaeap – RIzz Nov 03 '22 at 19:39
The way I did it should work for you the way you want it. It only merges the tables on shared values. – DannySlor Nov 04 '22 at 05:18
I want the last row in the final table to be ID1 = 11, ID2 = 8, ID3 = 6 but because Table 2 has ID2 = 9 and ID3 = 6, the final table does not have ID3 = 6 at all because ID2 = 9 is not in Table 1. If that makes sense, you can look at the Fiddle link and it shows it. dbfiddle.uk/uSfmaeap – RIzz Nov 04 '22 at 13:14

Using Subqueries to remove duplicate IDs

1 Answers1