1

I have this tables:

CREATE TABLE customer_orders (
  "order_id" INTEGER,
  "customer_id" INTEGER,
  "pizza_id" INTEGER,
  "exclusions" VARCHAR(4),
  "extras" VARCHAR(4),
  "order_time" TIMESTAMP
);
INSERT INTO customer_orders
  ("order_id", "customer_id", "pizza_id", "exclusions", "extras", "order_time")
VALUES
  ('1', '101', '1', '', '', '2020-01-01 18:05:02'),
  ('2', '101', '1', '', '', '2020-01-01 19:00:52'),
  ('3', '102', '1', '', '', '2020-01-02 23:51:23'),
  ('3', '102', '2', '', NULL, '2020-01-02 23:51:23'),
  ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
  ('4', '103', '1', '4', '', '2020-01-04 13:23:46'),
  ('4', '103', '2', '4', '', '2020-01-04 13:23:46'),
  ('5', '104', '1', NULL, '1', '2020-01-08 21:00:29'),
  ('6', '101', '2', NULL, NULL, '2020-01-08 21:03:13'),
  ('7', '105', '2', NULL, '1', '2020-01-08 21:20:29'),
  ('8', '102', '1', NULL, NULL, '2020-01-09 23:54:33'),
  ('9', '103', '1', '4', '1, 5', '2020-01-10 11:22:59'),
  ('10', '104', '1', NULL, NULL, '2020-01-11 18:34:49'),
  ('10', '104', '1', '2, 6', '1, 4', '2020-01-11 18:34:49');

CREATE TABLE runner_orders (
  "order_id" INTEGER,
  "runner_id" INTEGER,
  "pickup_time" VARCHAR(19),
  "distance" DECIMAL(5,2) NULL,
  "duration" VARCHAR(10),
  "cancellation" VARCHAR(23)
);
INSERT INTO runner_orders
  ("order_id", "runner_id", "pickup_time", "distance", "duration", "cancellation")
VALUES
  ('1', '1', '2020-01-01 18:15:34', '20', '32', ''),
  ('2', '1', '2020-01-01 19:10:54', '20', '27', ''),
  ('3', '1', '2020-01-03 00:12:37', '13.4', '20', NULL),
  ('4', '2', '2020-01-04 13:53:03', '23.4', '40', NULL),
  ('5', '3', '2020-01-08 21:10:57', '10', '15', NULL),
  ('6', '3', NULL, NULL, NULL, 'Restaurant Cancellation'),
  ('7', '2', '2020-01-08 21:30:45', '25', '25mins', NULL),
  ('8', '2', '2020-01-10 00:15:02', '23.4', '15', NULL),
  ('9', '2', NULL, NULL, NULL, 'Customer Cancellation'),
  ('10', '1', '2020-01-11 18:50:20', '10', '10', NULL);

And I they gave me the question: For each customer, how many delivered pizzas had at least 1 change and how many had no changes?

Now, I tried to do this:

WITH cte_1
AS (
    SELECT co.customer_id
        , co.order_id
        , co.exclusions
        , isnull(co.exclusions,'') AS exc
        , isnull(co.extras,'') AS ext
    FROM customer_orders as co
    INNER JOIN runner_orders as ru 
        ON co.order_id = ru.order_id
    WHERE NOT ru.cancellation IN ('Restaurant Cancellation', 'Customer Cancellation')    
)    
SELECT customer_id
    , COUNT(order_id) AS no_changes 
FROM cte_1
WHERE exc = '' AND ext = ''
GROUP BY customer_id

As you can see this so long but at least works and the numbers are correct. But then when I try to do the same to get the result for pizzas with changes like this:

WITH cte_1
AS (
    SELECT co.customer_id
        , co.order_id
        , co.exclusions
        , isnull(co.exclusions,'') AS exc
        , isnull(co.extras,'') AS ext
    FROM customer_orders as co
    INNER JOIN runner_orders as ru 
        ON co.order_id = ru.order_id
    WHERE NOT ru.cancellation IN ('Restaurant Cancellation', 'Customer Cancellation')    
)    
SELECT customer_id
    , COUNT(order_id) AS with_changes 
FROM cte_1
WHERE not exc = '' 
GROUP BY customer_id
UNION
SELECT customer_id
    , COUNT(order_id) AS with_changes 
FROM cte_1
WHERE not ext = '' 
GROUP BY customer_id

It doesn't work, and I'm not sure if it's because of the Nulls and blanks or the approach in general. Please if there is a better and shorter way to do this.

EXPECTED RESULT would be something like:

customer_id orders_with_changes orders_with_NO_changes
101 --- 2
102 --- 3
103 3 ---
104 2 1
105 1 ---
Dale K
  • 25,246
  • 15
  • 42
  • 71
Ale
  • 107
  • 6
  • Can you update your post with the full expected output? – lemon Jun 03 '23 at 21:52
  • 4
    "it doesn't work" isn't a helpful problem statement. *Why* doesn't it work? – Thom A Jun 03 '23 at 21:52
  • @lemon sure! Let me write it down and I'll update the post. – Ale Jun 03 '23 at 21:55
  • @ThomA Hi nice to see you again Tom, you helped me before haha and yeah is like you say, I want to know Why it doesnt work so I can learn and fix it. – Ale Jun 03 '23 at 21:56
  • @lemon done, that would be the expected result. Remeber order 6 and 9 wont count because, in the runners_orders table it says they were canceled. – Ale Jun 03 '23 at 22:07
  • 1
    This doesn't appear to be valid SQL Server syntax - are you sure that is what you are using? – Dale K Jun 04 '23 at 00:08
  • And you almost certainly don't want `UNION` but want `UNION ALL` because `UNION` tries to de-dup the results and doesn't tend to perform too well. – Dale K Jun 04 '23 at 00:11
  • @DaleK You mean because of the tables or what I tried to do to solve the problem? – Ale Jun 04 '23 at 01:08
  • 1
    Well try and run your complete example in sql server and you'll see it doesn't run – Dale K Jun 04 '23 at 02:16
  • 2
    As an example https://dbfiddle.uk/JNIZ95LE – Dale K Jun 04 '23 at 02:47
  • @DaleK I see, that's weird because it works on my computer :/ Maybe If you change the TIMESTAMP for DATETIME? I did that but afer I created the table, part of the practice was to clean the data (there was some 'null' that I changed for NULL too) Here is the original table maybe you can spot what is wrong, I am new and when something goes wrong honestly I just panic haha 8weeksqlchallenge.com/case-study-2/ And please let me know what is wrong so I can learn and fix it here in my post :) – Ale Jun 05 '23 at 16:00
  • If it works on your computer then you aren't using SQL server. – Dale K Jun 05 '23 at 18:57
  • @DaleK oh? yes I am, I installed it and I use it everyday :) Maybe is your computer? Because Lemon didn't have problems with the table – Ale Jun 05 '23 at 21:27
  • SQL server does not support a `timestamp` which contains a datetime value. If you look at Lemons fiddle, they changed your `timestamp` to `datetime` which is the SQL Server datetime format. If you open the exercise you are working on in the provided fiddle it tells you clearly that its running on PostgreSQL which does support a `timestamp` datatype which takes a datetime value. If that fiddle works on your computer then you must also be running PostgreSQL - else it would give you an error. – Dale K Jun 05 '23 at 23:17
  • @DaleK oh I see, thats why in my computer it also appears as datetime! I thought I changed it after creating the table,I probably changed it while creating the table then! I started this practice case a week ago so I didnt remeber that. sorry and thank you! – Ale Jun 06 '23 at 15:32

1 Answers1

3

You can try approaching this problem by:

  • filtering on the runners table with the NOT EXISTS operator, to check and discard the orders that were subject to alteration by third-party, before any aggregation is carried out
  • applying conditional aggregation with COUNT + CASE expression, by counting records that satisfy conditions (orders without changes are the ones that have both extras and exclusions null, while order with changes are identified by the opposite condition)
SELECT customer_id, 
       COUNT(CASE WHEN NOT COALESCE(extras, '') = '' 
                    OR NOT COALESCE(exclusions, '') = '' THEN 1 END) AS num_with_changes,
       COUNT(CASE WHEN COALESCE(extras, '') = ''
                   AND COALESCE(exclusions, '') = '' THEN 1 END) AS num_with_no_changes
FROM customer_orders co
WHERE NOT EXISTS(SELECT 1 
                 FROM runner_orders ro 
                 WHERE co.order_id = ro.order_id 
                   AND NOT COALESCE(cancellation, '') = '')
GROUP BY customer_id

Output:

customer_id num_with_changes num_with_no_changes
101 0 2
102 0 3
103 3 0
104 2 1
105 1 0

Check the demo here.

lemon
  • 14,875
  • 6
  • 18
  • 38
  • wow thank you!. Well I need to look up the NOT EXISTS operator because I have not see it before and is blowing my mind that you dont need a JOIN to join the runners table, haha. – Ale Jun 03 '23 at 22:34
  • 1
    also this morning I tried and failed to use COUNT + CASE, but using ISNULL and got an error, haha Honestly I need to study CASE more I rarely get it right. Thank you again I'll sit down and see if I understand everything here – Ale Jun 03 '23 at 22:36
  • 1
    It's actually doing a join operation under the hood, as the condition is applied inside the subquery (`co.order_id = ro.order_id`), although there *may be* [some efficiency differences](https://stackoverflow.com/questions/227037/can-i-get-better-performance-using-a-join-or-using-exists). Usually I use `EXISTS` when I don't need to gather every column from the other table, but just check for specific conditions. – lemon Jun 03 '23 at 22:38
  • I see. And I have 2 minor questions. With COALESCE I knew that it ended with the parentesis () why do you add = '' after that at the end? / And my other question is, in the subquery why do write SELECT 1 ? Because 1 is not a column – Ale Jun 03 '23 at 22:49
  • 1
    `COALESCE` takes the first non-null column. In our case we have say `COALESCE(extras, '')`. So it will first check if extras is NULL: if it's not null, will take its value, otherwise will go on and check if '' is NULL (which never happens). At the end of the day that expression will convert all NULL values to '', so that the check can either contain '' or a value that indicates a change. – lemon Jun 03 '23 at 22:52
  • 1
    Concerning `SELECT 1`, the DBMS is returning exactly the value 1 if the condition is satisfied, or NULL otherwise. We could have written `SELECT 0`, `SELECT 1`, `SELECT extras`, and would have been just fine, because we're only checking the existence of any value that satisfy a condition (or its non-existence). Using an integer is just a simplification, as long as it's a value we don't really care about, unless it's NULL. – lemon Jun 03 '23 at 22:54
  • 1
    I see I see the COALESCE is starting to make sense. The second one I need to google it a bit to fully get it, that is quite new for me haha Thank you again! – Ale Jun 03 '23 at 22:59