1

My task is to write a query that will return sales information for each customer category and year. The columns required in the result set are:

  • OrderYear - the year the orders were placed
  • CustomerCategoryName - as it appears in the table Sales.CustomerCategories
  • CustomerCount - the number of unique customers placing orders for each CustomerCategoryName and OrderYear
  • OrderCount - the number of orders placed for each CustomerCategoryName and OrderYear
  • Sales - the subtotal from the orders placed, calculated from Quantity and UnitPrice of the table Sales.OrderLines
  • AverageSalesPerCustomer - the average sales per customer for each CustomerCategoryName and OrderYear

The results should be sorted in ascending order, first by order year, then by customer category name.

My attempt at a solution:

SELECT 
    CC.CustomerCategoryName,
    YEAR(O.OrderDate) AS OrderYear,
    COUNT(DISTINCT C.CustomerID) AS CustomerCount,
    COUNT(DISTINCT O.OrderID) AS OrderCount,
    SUM(OL.Quantity * OL.UnitPrice) AS Sales,
    SUM(OL.Quantity * OL.UnitPrice) / COUNT(DISTINCT C.CustomerID) AS AverageSalesPerCustomer
FROM
    Sales.CustomerCategories CC
INNER JOIN 
    Sales.Customers C ON C.CustomerCategoryID = CC.CustomerCategoryID
INNER JOIN 
    Sales.Orders O ON O.CustomerID = C.CustomerID
INNER JOIN 
    Sales.OrderLines OL ON OL.OrderID = O.OrderID
GROUP BY 
    CC.CustomerCategoryName, YEAR(O.OrderDate)
ORDER BY 
    YEAR(O.OrderDate), CC.CustomerCategoryName;

My OrderCount seems correct. However, I don't believe my CustomerCount is correct and my Sales and AverageSalesPerCustomer seem way off. The Categories that do not have any customers and orders do not show up in my results.

Is the reason that my counts are off and that he categories that do not have any customers are omitted is because they only have null values? I believe the question is looking for all the categories.

I am using the sample tables of WideWorldImporters from Microsoft.

Any help would be appreciated as I am new to SQL and Joins are a very hard concept for me to understand.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    Replace the "inner join"s with "left join" – Mazhar Sep 14 '18 at 22:24
  • @Mazhar Doing that added the missing categories to the top of my result, but they arent there grouped in with every year grouping so they only appear once. I know it doesn't make sense to the engine because they dont have an order year to group with though. Otherwise it still seems that my sales are inccorect. it states my average sales per customer is 67149 with just 41 customers with 1187 orders. Am I calculating this value incorrectly? How would I check the accuracy here – RobWantsToLearn Sep 14 '18 at 22:35
  • Possible duplicate of [Multiple Self-Join based on GROUP BY results](https://stackoverflow.com/q/47758492/3404097) – philipxy Sep 14 '18 at 23:08
  • Hi. [mcve] please. Part of doing that includes justifying "minimal code with a problem" for your required functionality by finding a maximal part of that code that works for part of that functionality. Show that your program calculates what you expect it to as it goes through (sub)expressions. Please clarify via post edits, not comments. PS Likely you want a join on PK of separate join-aggregations. PS Grouping does not involve ordering. PS [Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097) – philipxy Sep 14 '18 at 23:13
  • 1
    This seems to be a faq. Please always google many clear, concise & specific versions/phrasings of your question/problem/goal with & without your particular strings/names & read many answers. Add relevant keywords you discover to your searches. If you don't find an answer then post, using one variant search for your title & keywords for your tags. See the downvote arrow mouseover text. – philipxy Sep 14 '18 at 23:14
  • Try this: `SELECT OrderYear, Sum(OrderCount) as OrdersPerYear, Sum(Sales) as TotalSales FROM (SELECT Year(O.OrderDate) as OrderYear, C.CustomerID, COUNT(Distinct O.OrderId) as OrderCount, SUM(OL.Quantity * OL.UnitPrice) AS Sales FROM Sales.Customers C INNER JOIN Sales.Orders O ON O.CustomerID = C.CustomerID INNER JOIN Sales.OrderLines OL ON OL.OrderID = O.OrderID GROUP BY Year(O.OrderDate), C.CustomerId) as TestSource`. Use this to evaluate the total orders, and average order cost, overall. If this doesn't match your expectations, dig into your data. – Laughing Vergil Sep 14 '18 at 23:55
  • You can also JOIN the Sales.CustomerCategories table into the outer query if you return the C.CustomerCategoryID in the inner query. Do that as a second step when you have validated step 1. Also, are you considering cancelled orders in your queries (if applicable)? – Laughing Vergil Sep 14 '18 at 23:58
  • @LaughingVergil I get error: "Column 'TestSource.OrderYear' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause." when using the query you suggested. I am also not allowed to use any subqueries if that is what you are referring to in your latter comment – RobWantsToLearn Sep 15 '18 at 05:51
  • If this is still open question then please provide queries to create the relevant table(s) and insert sample data, as well as the result set that you want to get. – Ronen Ariely Sep 15 '18 at 13:17

1 Answers1

1

Presently, you're getting only the data that exists in order details...and not getting anything for the non-existent orders. Normally, this is accomplished with outer joins instead of inner joins, and an isnull(possiblyNullValue,replacementValue).

Also, while you're grouping by year(o.OrderDate), your join for orders isn't distinguishing by year...probably getting all years worth of data for each customer for each reporting period.

So, let's get the reporting period out first...and make sure we're basing our results on that:

select distinct year(o.OrderDate) from Sales.Orders

But really, you want all categories and all years...so you can combine them to get the real basis:

select
  cc.CustomerCategoryId,
  cc.CustomerCategoryName, 
  year(o.OrderDate)
from 
  Sales.Orders o
    cross join
  Sales.CustomerCategories cc
group by
  cc.CustomerCategoryId,
  cc.CustomerCategoryName, 
  year(o.OrderDate)

Now, you want to join this mess into the remaining query. There are two ways to do this...one is to use a with clause...but sometimes it's just easier to just wrap the basis query up in parentheses and use it as if it was a table:

select
  cy.CustomerCategoryName,
  cy.CalendarYear,
  count(distinct c.CustomerId) CustomerCount,
  isnull(sum(ol.UnitPrice * ol.Quantitiy),0.0) Sales,
  isnull(sum(ol.UnitPrice * ol.Quantitiy) / count(distinct c.CustomerId),0.0) AverageSalesPerCustomer
from
  (
    select
      cc.CustomerCategoryId,
      cc.CustomerCategoryName, 
      year(o.OrderDate) CalendarYear --> must name calc'd cols in virtual tables
    from 
      Sales.Orders o
        cross join
      Sales.CustomerCategories cc
    group by
      cc.CustomerCategoryId,
      cc.CustomerCategoryName, 
      year(o.OrderDate)
  ) as cy --> cy is the "Category Years" virtual table
    left outer join
  Sales.Customers c
    on cy.CustomerCategoryId = c.CustomerCategoryId
    left outer join
  Sales.Orders o
    on 
      c.CustomerId = o.CustomerId          --> join on customer and year
      and                                  --> to make sure we're only getting 
      cy.CalendarYear = Year(o.OrderDate)  --> orders in the right year
    left outer join
  Sales.OrderLines ol
    on o.OrderId = ol.OrderId
group by
  cy.CalendarYear,
  cy.CustomerCategoryName
order by
  cy.CalendarYear,
  cy.CustomerCategoryName

By the way...get comfortable messing with your queries to select some subset...for example, you can add a where clause to select only one company...and then go have a look at the details...to see if it passes the smell test. It's a lot easier to evaluate the results when you limit them. Similarly, you can add the customer to the select list and the outer grouping for the same reason. Experimentation is the key.

Clay
  • 4,999
  • 1
  • 28
  • 45
  • thank you very much for this explanation, it helped a ton in my understanding. I will keep experimenting which seems the best way to learn:) – RobWantsToLearn Sep 15 '18 at 18:56
  • is there a way to do this query without using a subquery, I can't seem to figure it out without the virtual table you created? – RobWantsToLearn Sep 17 '18 at 03:12
  • The reason you need the subquery at all is to "isolate" the years. You don't technically need to include the categories with it...but I thought it made more sense that way. You could create a view that returns only the years and use that instead. This situation is not uncommon when you don't have a "normalized" source...but instead have a source that's a calculated from details. Such a view definition could be `create view Years as select distinct year(OrderDate) CalendarYear from Sales.Orders` – Clay Sep 17 '18 at 11:33
  • Is there a way to do an equivalent without these subqueries (derived tables) or using views? We haven't learned about these yet, and I am suppose to accomplish this task with strictly joins. I see how it works well with these subqueires but it explicitly states we cannot use them – RobWantsToLearn Sep 17 '18 at 17:55