1

I have a CTE inside a SQL Stored Procedure that is UNIONing values from two databases - the values are customer numbers and that customer's last order date.

Here is the original SQL -

;WITH CTE_last_order_date AS
(
SELECT c1.customer ,MAX(s2.dt_created) AS last_order_date
FROM customers c1 WITH (NOLOCK)

LEFT JOIN archive_orders s2 WITH (NOLOCK)
ON c1.customer = s2.customer

GROUP BY c1.customer

UNION ALL

SELECT c1.customer ,MAX(s1.dt_created) AS last_order_date
FROM customers c1 WITH (NOLOCK)

LEFT JOIN orders s1 WITH (NOLOCK)
ON c1.customer = s1.customer

GROUP BY c1.customer
)

Example Results:

customer,    last_order_date
CF122595,    2011-11-15 15:30:22.000
CF122595,    2016-08-15 10:01:51.230

(2 row(s) affected)

This obviously doesn't apply the UNION distinct records rule because the date values are not matched, meaning SQL returned the max value from both tables (i.e. the final record set was not distinct)

To try and get around this, I tried another method borrowed from this question and implemented grouping:

;WITH CTE_last_order_date AS
(
SELECT max(last_order_date) as 'last_order_date', customer
FROM (
SELECT distinct cust.customer, max(s2.dt_created) AS last_order_date, '2' AS 'group'
FROM customers c1 WITH (NOLOCK)

LEFT JOIN archive_orders s2 WITH (NOLOCK)
ON c1.customer = s2.customer

GROUP BY c1.customer

UNION 

SELECT distinct c1.customer, max(sord.dt_created) AS last_order_date, '1' AS 'group'
FROM customers c1 WITH (NOLOCK)

LEFT JOIN orders s1 WITH (NOLOCK)
ON cust.customer = sord.customer

GROUP BY
   c1.customer
   ) AS t
GROUP  BY customer
ORDER  BY MIN('group'), customer
)

Example Results:

customer,    last_order_date
CF122595,    2016-08-15 10:01:51.230

(1 row(s) affected)

This had the distinction (hah) of working fine, up until clattering into the rule that prevents ORDER BY inside Common Table Expressions, which is needed in order to pick the lowest group (which would imply Live orders (group 1), whose date needs to take precedence over the Archive (group 2)).

The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.

All help or ideas appreciated.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Optimaximal
  • 545
  • 1
  • 5
  • 28
  • 1
    you have to ORDER outside the CTE, unless you are using SELECT TOP , or otherwise require an order. You can ORDER outside the CTE. So what ORDER do you require? What are you trying to return here? Could you give example data and results? Are you saying you want the max from the two tables? if so union the two tables in a sub-query, thentake the max of the sub-query. – Cato Jan 15 '19 at 10:08
  • More detail added, including results. – Optimaximal Jan 15 '19 at 10:23

4 Answers4

2

Rather than grouping, then unioning, then grouping again, why not union the orders tables and work from there:

SELECT c1.customer ,MAX(s2.dt_created) AS last_order_date
FROM customers c1
INNER JOIN (select customer, dt_created from archive_orders
union all select customer, dt_created from orders) s2
ON c1.customer = s2.customer
GROUP BY c1.customer

Remember, in SQL your job is to tell the system what you want, not what steps/procedure to follow to get those results. The above, logically, describes what we're wanting - we want the last order date from each customer's orders, and we don't care whether that was an archived order or a non-archived one.

Since we're going to reduce the order information down to a single row (per customer) during the GROUP BY behaviour anyway, we don't also need the UNION to remove duplicates so I've switched to UNION ALL.

(I confess, I couldn't really see what the ORDER BY was supposed to be adding to the mix at this point so I've not tried to include it here. If this is going into a CTE, then reflect on the fact that CTEs, just like tables and views, have no inherent order. The only ORDER BY clause that affects the ordering of result rows is the one applied to the outermost/final SELECT)


Giving orders precedence over archived_orders:

;With CTE1 as (
    SELECT c1.customer,group,MAX(s2.dt_created) as MaxInGroup
    FROM customers c1
    INNER JOIN (select customer, dt_created,2 as group from archive_orders
    union all select customer, dt_created,1 from orders) s2
    ON c1.customer = s2.customer
    GROUP BY c1.customer,group
), CTE2 as (
    SELECT *,ROW_NUMBER() OVER (PARTITION BY customer ORDER BY group) as rn
    from CTE2
)
select * from CTE2 where rn = 1
Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • Very good points to make and proof why it's always good to have a fresh pair of eyes on these things. This is trying to modify work done by a third party based on some unplanned change of spec and I was trying to bolster a script that was already (seemingly) too complex. I also forgot to KISS. :D – Optimaximal Jan 15 '19 at 10:29
  • @Optimaximal - you may want to revert the accept for the time being. Your edit made clear that any order should take precedence over an archived one and this query doesn't yet. Working on it. – Damien_The_Unbeliever Jan 15 '19 at 10:30
  • @Optimaximal - see edit at bottom with precedence implemented too. – Damien_The_Unbeliever Jan 15 '19 at 10:34
  • Whilst I thank you for the added concept, I don't think it's needed. The need to give Live orders precedence over the Archived ones was a result of using the grouping method. Archived Orders are simply all orders > 3 years old and they cannot exist in both tables - If all the data is being rolled together, then the simple `MAX()` command is fine. Keep your Tick! :) – Optimaximal Jan 15 '19 at 10:56
1

An alternative approach could be to only get the customer from the archive table where we do not have a current one. Something like:

WITH CurrentLastOrders(customer, last_order_date) AS    -- Get current last orders
(
    SELECT o.customer, max(o.dt_created) AS last_order_date
    FROM orders s WITH (NOLOCK) ON c.customer = o.customer
    GROUP BY o.customer
),
ArchiveLastOrders(customer, last_order_date) AS -- Get archived last orders where customer does not have a current order
(
    SELECT o.customer, max(o.dt_created) AS last_order_date
    FROM archive_orders o WITH (NOLOCK)
    WHERE NOT EXISTS ( SELECT *
                        FROM CurrentLastOrders lo
                        WHERE o.customer = lo.customer)
    GROUP BY o.customer
),
AllLastOrders(customer, last_order_date) AS -- All customers with orders
(
    SELECT customer, last_order_date
    FROM CurrentLastOrders
    UNION ALL
    SELECT customer, last_order_date
    FROM ArchiveLastOrders
)
AllLastOrdersPlusCustomersWithNoOrders(customer, last_order_date) AS    -- All customerswith latest order if they have one
(
    SELECT customer, last_order_date
    FROM AllLastOrders
    UNION ALL
    SELECT customer, null
    FROM customers c WITH (NOLOCK)
    WHERE NOT EXISTS ( SELECT *
                        FROM AllLastOrders lo
                        WHERE c.customer = lo.customer)
)
PhilS
  • 624
  • 3
  • 5
1

I wouldn't try to nest SQL to achive a distinct result set, it's the same logic of grouping by customer in both unioned queries. If you want a distinct ordered set, you can do that outside of the CTE

How about:

;WITH CTE_last_order_date AS
(
   SELECT c1.customer ,s2.dt_created AS last_order_date, '2' AS 'group'
   FROM customers c1 WITH (NOLOCK)
   LEFT JOIN archive_orders s2 WITH (NOLOCK) ON c1.customer = s2.customer

   UNION ALL

   SELECT c1.customer ,s1.dt_created AS last_order_date, '1' AS 'group'
   FROM customers c1 WITH (NOLOCK)
   LEFT JOIN orders s1 WITH (NOLOCK) ON c1.customer = s1.customer

)
SELECT customer, MAX(last_order_date)
FROM CTE_last_order_date
GROUP BY customer 
ORDER BY MIN('group'), customer
Samir
  • 283
  • 2
  • 10
  • The CTE is being created to hold the last order dates in order to reference them as part a later `INSERT` command in the Stored Procedure, so they'd need to be pre-sorted. – Optimaximal Jan 15 '19 at 10:59
1

if you union all possible rows together, then calculate a row_number, partitioned on customer and ordered on 'group' then last_order_date descending, you can then select all the row=1 to give the 'top 1' per customer

;WITH CTE_last_order_date AS
(
SELECT max(last_order_date) as 'last_order_date', customer
FROM (
SELECT distinct cust.customer, max(s2.dt_created) AS last_order_date, '2' AS 'group'
FROM customers c1 WITH (NOLOCK)

LEFT JOIN archive_orders s2 WITH (NOLOCK)
ON c1.customer = s2.customer

GROUP BY c1.customer

UNION 

SELECT distinct c1.customer, max(sord.dt_created) AS last_order_date, '1' AS 'group'
FROM customers c1 WITH (NOLOCK)

LEFT JOIN orders s1 WITH (NOLOCK)
ON cust.customer = sord.customer

GROUP BY
   c1.customer
   ) AS t
GROUP  BY customer

)
,   --row_number below is 'per customer' and can be used to make rn=1 the top 1 for each customerid
ROWN AS (SELECT Customer,last_order_date,[group], row_number() OVER(partition by customer order by [group] ASC, sord.dt_created DESC) AS RN)
SELECT * FROM Rown WHERE Rown.rn = 1
Cato
  • 3,652
  • 9
  • 12