-2

I have these lines of code which you can find below and I want to automatize this (most probably using a loop function) by just giving the input: 'CITY', 'COUNTY', 'PARTNERS', 'PRODUCT'. Is there any way to do this? Also the names of temporary tables ('CUSTOMERS_MND_0', CUSTOMERS_MND_1', 'CUSTOMERS_MND_2', 'CUSTOMERS_MND_3', 'CUSTOMERS_MND_4') are not important, so those can be renamed as the variable names.

The issue I'm facing is that "UNIQUE_KEY" is not unique and I need to have it unique in the final db. Please find an example.

Initial DB:

UNIQUE_KEY CITY COUNTY PARTNERS PRODUCT
111 VIENNA A W 1
111 NAPLES B X 7
112 VIENNA B Y 3
113 NAPLES B W 4
113 NAPLES A W 4
114 VIENNA A W 1
115 VIENNA B W 4
115 NAPLES A W 4
115 VIENNA B X 7
115 VIENNA B Y 3
116 NAPLES B W 4
116 NAPLES A W 4
116 VIENNA A W 1

FINAL DB:

UNIQUE_KEY CITY COUNTY PARTNERS PRODUCT
111 VIENNA A W 1
112 VIENNA B Y 3
113 NAPLES B W 4
114 VIENNA A W 1
115 VIENNA B Y 3
116 VIENNA A W 1
SELECT AA.*
INTO #CUSTOMERS_MND_1
FROM #CUSTOMERS_MND_0 AA
     INNER JOIN (SELECT UNIQUE_KEY,
                        MAX(CITY) AS MAXCITY
                 FROM #CUSTOMERS_MND_0
                 GROUP BY UNIQUE_KEY) BB ON AA.UNIQUE_KEY = BB.UNIQUE_KEY
                                        AND AA.CITY = BB.MAXCITY;
DROP TABLE #CUSTOMERS_MND_0;

SELECT AA.*
INTO #CUSTOMERS_MND_2
FROM #CUSTOMERS_MND_1 AA
     INNER JOIN (SELECT UNIQUE_KEY,
                        MAX(COUNTY) AS MAXCOUNTY
                 FROM #CUSTOMERS_MND_1
                 GROUP BY UNIQUE_KEY) BB ON AA.UNIQUE_KEY = BB.UNIQUE_KEY
                                        AND AA.COUNTY = BB.MAXCOUNTY;
DROP TABLE #CUSTOMERS_MND_1;

SELECT AA.*
INTO #CUSTOMERS_MND_3
FROM #CUSTOMERS_MND_2 AA
     INNER JOIN (SELECT UNIQUE_KEY,
                        MAX(PARTNERS) AS MAXPARTNERS
                 FROM #CUSTOMERS_MND_2
                 GROUP BY UNIQUE_KEY) BB ON AA.UNIQUE_KEY = BB.UNIQUE_KEY
                                        AND (AA.PARTNERS = BB.MAXPARTNERS
                                          OR AA.PARTNERS IS NULL
                                         AND BB.MAXPARTNERS IS NULL);
DROP TABLE #CUSTOMERS_MND_2;

SELECT AA.*
INTO #CUSTOMERS_MND_4
FROM #CUSTOMERS_MND_3 AA
     INNER JOIN (SELECT UNIQUE_KEY,
                        MAX(PRODUCT) AS MAXPRODUCT
                 FROM #CUSTOMERS_MND_3
                 GROUP BY UNIQUE_KEY) BB ON AA.UNIQUE_KEY = BB.UNIQUE_KEY
                                        AND (AA.PRODUCT = BB.MAXPRODUCT
                                          OR AA.PRODUCT IS NULL
                                         AND BB.MAXPRODUCT IS NULL);
DROP TABLE #CUSTOMERS_MND_3;

It works, but for more joins it is not time efficient.

  • *"I am using SQL Server Management Studio v15."* If you're using SSMS 15 why tag MySQL and PSQL (which is the dialect used by PostgreSQL) too..? Tag spam doesn't help us help you; it makes it harder to. – Thom A Apr 01 '22 at 09:06
  • 1
    Also whitespace and Linebreaks are paramount to making readable text; not just in code. Please get into the habit of making good use of both. Poor/bad formatting is not going to help you or others when you need to be able to quickly read and understand your code. Using indentation, instead of all left aligned text, and line breaks, really helps easily distinguish specific code blocks and sections, and make finding errors far easier when a line only contains 10's of characters, rather than 100's. – Thom A Apr 01 '22 at 09:06
  • @Larnu Thank you for your input! It's my first post so I didn't know that is not ok to tag all the dialects of SQL. – Andrei Mihailescu Apr 01 '22 at 09:17
  • 1
    The tag excerpts do also tell you this. From [[tag:mysql]]: *"DO NOT USE this tag for other DBs such as SQL Server, SQLite etc."* – Thom A Apr 01 '22 at 09:19
  • 2
    As for your problem, looping is likely the *last* thing you want to do; SQL is a set based language and looping is the complete opposite of that methodology. Ideally you need to give us some sample data, expected results (in a consumable format) and explain the logic you are trying to achieve. – Thom A Apr 01 '22 at 09:21
  • 2
    Instead of asking us to reverse engineer your code to work out what you are trying to achieve you could tell us. – P.Salmon Apr 01 '22 at 09:42
  • Looks like a multi-pass de-duplicator. However, each pass is using the same `UNIQUE_KEY`, meaning there's nothing left to de-duplicate after the first pass. The 2nd, 3rd, 4th pass appear to be totally redundant. I agree with P.Salmon; please explain your requirements. Provide sample input data and desired output. – Ruud Helderman Apr 01 '22 at 09:59
  • I am not sure how to add a table here for the examples. After the first join the "UNIQUE_KEY" is not unique so that's why I need the other 3 joins. – Andrei Mihailescu Apr 01 '22 at 10:06
  • @Larnu: a little nitpicking :) `psql` is the name of the default command line client for the database named PostgreSQL. Which doesn't have a special name for their SQL dialect - it's just "SQL" –  Apr 01 '22 at 10:12
  • 1
    'After the first join the "UNIQUE_KEY" is not unique so that's why I need the other 3 joins.' so what's the logic for deciding which of the duplicates to keep? – P.Salmon Apr 01 '22 at 10:13
  • Possibly you need a top-1-per-group solution like `ROW_NUMBER` https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group for example `SELECT c.* FROM(SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY UNIQUE_KEY) FROM #CUSTOMERS_MND_0) c WHERE rn = 1` – Charlieface Apr 01 '22 at 10:17
  • I thought that is obvious that these are some dummny examples and the logic in reality is not this. For you the only logic is the MAX operator and nothing more. Could you automatize this if you know the logic for why I take the MAX and not the MIN or any other function? – Andrei Mihailescu Apr 01 '22 at 10:20
  • Images of data doesn't help us help you, @AndreiMihailescu, we can't use it. As I mentioned, provide it in a consumable format. DDL and DML are by far the best method. – Thom A Apr 01 '22 at 10:27
  • my guess is you want the max city per unique key, within that city the max county, within that county the max partners and within that partner the max product ? – P.Salmon Apr 01 '22 at 11:04
  • No it's not obvious at all. We can only see what you show us, and have to make assumptions based on that. Please be clearer about what you are trying to achieve, don't just give us a code dump – Charlieface Apr 01 '22 at 13:09

1 Answers1

1

my guess is you want the max city per unique key, within that city the max county, within that county the max partners and within that partner the max product in which case generating a row number and selecting row number 1 gives the same result as your code

create table #CUSTOMERS_MND_0
(unique_key int,city varchar(10),county varchar(10),partners varchar(10),product int);
go

insert into #CUSTOMERS_MND_0 values
(111,'vienna','a','w',7),(111,'naples','b','x',7),(111,'vienna','b','w',6),(111,'vienna','b','x',5)
go
 
with cte as
(select *, row_number() over (partition by unique_key order by city desc , county desc, partners desc,product desc) rn 
from
#CUSTOMERS_MND_0
)
select * from cte where rn = 1;

your code plus

select * from #CUSTOMERS_MND_3

unique_key  city       county     partners   product
----------- ---------- ---------- ---------- -----------
111         vienna     b          x          5

my code

unique_key  city       county     partners   product     rn
----------- ---------- ---------- ---------- ----------- --------------------
111         vienna     b          x          5           1

Now we have simplified code we can think about automation. You say you want to supply known columns as part of this process but unique key will always be required and source tables is known so--see https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=355f06fb7d51ba8fed9ea4e6d1e23d07 for an example of how using a temp table to hold the columns and the priority order in this case by identity. Then read up on dynamic sql..

P.Salmon
  • 17,104
  • 2
  • 12
  • 19