-1

I am new to SQL queries, I have some data and I am trying to find the result which is shown below.

In my sample data, I have customer ID repeating multiple times due to multiple locations, What I am looking to do is create a query which gives output shown in image output format,

  1. If customer exists only once I take that row
  2. If customer exists more than once, I check the country; if Country = 'US', I take that ROW and discard others
  3. If customer exists more than once and country is not US, then I pick the first row

PLEASE NOTE: I Have 35 columns and I dont want to change the ROWS order as I have to select the 1st row in case customer exist more than once and country is not 'US'.

What I have tried: I am trying to do this using rank function but was unsuccessful. Not sure if my approach is right, Please anyone share the T-SQL query for the problem.

Regards, Rahul

Sample data:

Data

Output required :

Output

  • While asking a question, you need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example): (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;). All within the question, no images. – Yitzhak Khabinsky Dec 21 '21 at 17:21
  • @YitzhakKhabinsky. Thanks I will take care of your points, though the issue is I am new and the images was taken automatically I just copied the data from excel. Thanks much for sharing post details. – Rahul the DFS Scout Dec 21 '21 at 17:27
  • @marc_s I am working on MSSQL 2016, That is just a tag but I feel the query logic will be same irrespective of version. correct ? – Rahul the DFS Scout Dec 21 '21 at 17:50
  • How do you define the ordering of not-US rows? – shawnt00 Dec 21 '21 at 18:52
  • Does this answer your question? [Get top 1 row of each group](https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group) Just use `OVER (PARTITION BY cust_id ORDER BY CASE WHEN country = 'US' THEN 1 ELSE 2 END, country)` – Charlieface Dec 21 '21 at 21:28
  • @shawnt00 Hi, the order of the ROWS should be same and I do not want to change the order. my 3rd point states that. I need to pick the first ROW in case there are multiple customer and their country is not 'US'. Hence the order of ROWS should not be changed. – Rahul the DFS Scout Dec 22 '21 at 11:04
  • There is no "natural order" of rows. Common misunderstanding of databases. – shawnt00 Dec 22 '21 at 16:30
  • @shawnt00.. could you please explain your statement There is no "natural order" of rows ? – Rahul the DFS Scout Dec 23 '21 at 10:00
  • There is no "same order" without a column to specify the order. – shawnt00 Dec 23 '21 at 16:06

2 Answers2

1

I have created a (short) dbfiddle

Short explanation (to just repeat the code here on SO):

Step1:

-- select everyting, and 'US' as first row
SELECT 
   cust_id, 
   country,
   sales,
   CASE WHEN country='US' THEN 0 ELSE 1 END X,
   ROW_NUMBER() OVER (PARTITION BY cust_id 
                      ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
FROM table1
ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END;

Step2:

-- filter only rows which are first row...
SELECT * 
FROM (
   SELECT 
      cust_id, 
      country,
      sales,
      CASE WHEN country='US' THEN 0 ELSE 1 END X,
      ROW_NUMBER() OVER (PARTITION BY cust_id 
                         ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
   FROM table1
   -- ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END
   ) x
WHERE x.R=1
Luuk
  • 12,245
  • 5
  • 22
  • 33
0

I can't vouch for performance but it should work on SQL Server 2005. Assuming your table is named CustomerData try this:

select cust_id, country, Name, Sales, [Group]
from CustomerData
where country = 'US'
union
select c.* from CustomerData c
join (
    select cust_id, min(country) country
    from CustomerData
    where cust_id not in (
        select cust_id
        from CustomerData
        where country = 'US'
        )
    group by cust_id
) a on a.cust_id = c.cust_id and a.country = c.country

It works by finding all those with a record with US as the country and then unioning that with the first country from every record that doesn't have the US as a country. If min() isn't getting the country you want then you'll need to find an alternative aggregation function that will select the country you want.

  • Hi, Thanks for your solution. I would like to mention here, My data has almost 30-35 fields, union is not a good solution. Also, your solution has Min(country) I think that will also not work for my data. Many thanks for your efforts though I need some other solution, I am using MSSQL 2016. – Rahul the DFS Scout Dec 21 '21 at 18:24
  • 1
    It's pretty arbitrary to reject an answer because you think 35 columns is too much for a union, especially when those kinds of details are omitted from the question. – shawnt00 Dec 21 '21 at 18:50
  • @shawnt00 Sorry but I was not expecting a solution using Union. Apologize if I missed to mention that I have 35 columns. Let me edit my query right now. – Rahul the DFS Scout Dec 22 '21 at 11:06
  • Rahul, you realize the * is readily replaced by the list of five columns required? – shawnt00 Dec 22 '21 at 16:38
  • Is this possible to do the same using EXCEL ? well I need to cross verify the result of the query.. kind of unit testing and I have like 3000 rows, manually validating it is kind of problematic. Excel solution is also appreciable. Thanks much everyone for this great help. – Rahul the DFS Scout Dec 23 '21 at 10:02