2

Good day guys, I've been struggling with this for the past day and I just can't seem to figure it out.

My task is to derive the most sold product for each country from the popular open source database called NORTHWIND: https://northwinddatabase.codeplex.com

I was able to get to this stage, here is my code in SQL Server:

--Get most sold product for each country
WITH TotalProductsSold AS 
(
    SELECT od.ProductID, SUM(od.Quantity) AS TotalSold
        FROM [Order Details] AS od
        GROUP BY od.ProductID
)
SELECT MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity, s.Country --,p.ProductName
    FROM Products AS p
    INNER JOIN TotalProductsSold
    ON TotalProductsSold.ProductID = p.ProductID
    INNER JOIN Suppliers AS s
    ON s.SupplierID = p.SupplierID
    GROUP BY s.Country
    ORDER BY MostSoldQuantity DESC

This gives me the following result:

enter image description here

That's all good but I wish to find out the product name for the MostSoldQuantity.

Thank you very much !

P.S I put a comment --p.ProductName where I thought it would work but it didnt and if someone could explain me why does GROUP BY not automatically allow me to derive the product name for the row that would be great

  • I assume you know your proposed answer doesn't work, as you deleted it. The important *reason* is that the `MAX(productID)` has nothing at all to do with the `MAX(TotalSold)`, they're calculated independently. (You create the data set, you create groups inside the data set, within in each group you get the highest TotalSold and separately you get the highest value of ProductID. There is no SQL expression for *"get the productID that is associated with the result of `MAX(TotalSold)` as calculated in another expression in the SELECT list`"*). – MatBailie Nov 27 '17 at 14:20
  • Yeah I realized as soon as I check the productIDs do not match to the most sold quantity – William Abboud Nov 27 '17 at 14:29
  • I am looking at your answer now, thank you for your time :) – William Abboud Nov 27 '17 at 14:29

1 Answers1

1

First, start with the count of products sold, per country, not just per product. Then rank them and pick only anything at RANK = 1. Something like...

WITH
    ProductQuantityByCountry AS 
(
    SELECT
       s.CountryID,
       p.ProductID,
       SUM(od.Quantity)   AS Quantity
    FROM
        [Order Details]   AS od
    INNER JOIN
        Products          AS p
            ON  p.ProductID = od.ProductID
    INNER JOIN
        Suppliers         AS s
            ON  s.SupplierID = p.SupplierID
    GROUP BY
       s.CountryID,
       p.ProductID
),
    RankedProductQuantityByCountry
AS
(
    SELECT
        RANK() OVER (PARTITION BY CountryID ORDER BY Quantity DESC)  AS countryRank,
        *
    FROM
        ProductQuantityByCountry
)
SELECT
    *
FROM
    RankedProductQuantityByCountry
WHERE
    countryRank = 1

Note, one country may supply identical quantity of different producs, and so two products could both have rank = 1. Look into ROW_NUMER() and/or DENSE_RANK() for other but similar behaviours to RANK().

EDIT: A simple though exercise to cover why SQL doesn't let you put Product.Name in your final query is to ask a question.

What should SQL do in this case?

SELECT
    MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity,
    MIN(TotalProductsSold.TotalSold) AS LeastSoldQuantity,
    s.Country,
    p.ProductName
FROM
    blahblahblah
GROUP BY
    s.Country
ORDER BY
    MostSoldQuantity DESC

The presence of a MIN and a MAX makes things ambiguous.

You may be clear that you want to perform an operation by country and that operation to be to pick the product with the highest sales volume from that country. But it's not actually explicit, and small changes to the query could have very confusing consequences to any inferred behaviour. Instead SQL's declarative syntax provides a very clear / explicit / deterministic description of the problem to be solved.

If an expression isn't mentioned in the GROUP BY clause, you can't SELECT it, without aggregating it. This is so that there is no ambiguity as to what is meant or what the SQL engine is supposed to do.

By requiring you to stipulate get the total sales per country per product at one level of the query, you can then cleanly state and then pick the highest ranked per country at another level of the query.

This can feel like you end up with queries that are longer than "should" be necessary. But it also results in queries that are completely un-ambiguous, both for compiling the query down to an execution plan, and for other coders who will read your code in the future.

MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • Thank you very much ! I got it to work and I used your ideas. I am still a beginner in SQL and I didn't know about the ranking functions and I was thinking there's gotta be an easier solution without going beyond GROUP BY and aggregate functions but so far your answer is the only one that solves my problem and not in a bad way imo. Thanks man ! :) – William Abboud Nov 27 '17 at 15:01
  • You an (if I recall correctly) do `RANK() OVER (PARTITION BY CountryID ORDER BY SUM(Quantity) DESC) AS CountryRank` in the first CTE to shorten it, I just separated it to make it clearer for you. – MatBailie Nov 27 '17 at 15:24