0

I am trying to create a sequel query for emails of customers without accounts that have only placed orders on over 3 months. If i do the query

SELECT checkouts.email 
FROM orders, checkouts 
WHERE orders.id = checkouts.order_id AND orders.user_id is NULL 
AND orders.created_at < CURDATE() - INTERVAL 3 MONTH

I get all emails from orders without accounts that were placed within the last 3 months, BUT it doesn't account for orders placed within the last 3 months without an account with the same emails from the sql query email list. Therefore, i want to use a MINUS operator to subtract all emails from recent orders. When i try this, i get an error:

SELECT checkouts.email 
FROM orders, checkouts 
WHERE orders.id = checkouts.order_id AND orders.user_id is NULL 
AND orders.created_at < CURDATE() - INTERVAL 3 MONTH 
MINUS 
SELECT checkouts.email 
FROM orders, checkouts 
WHERE orders.id = checkouts.order_id AND orders.user_id is NULL 
AND orders.created_at > CURDATE() - INTERVAL 3 MONTH

If i run this same query but with a UNION instead of a MINUS, it works, and essentially gives me all emails from orders without an account, regardless of the date.

Why does UNION work and not MINUS? How do i fix this query so i can get MINUS to work?

Zyren
  • 603
  • 2
  • 7
  • 21
  • 2
    MySQL does not support `MINUS` or `EXCEPT` – Martin Smith Feb 24 '12 at 19:46
  • Ok. That sucks. Any idea how to do this without using minus? – Zyren Feb 24 '12 at 19:48
  • 1
    Other ways of doing anti semi joins are (1) `OUTER JOIN` and filter on `NULL`, (2) `NOT EXISTS` (3) `NOT IN`. [Some comparisons here](http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/) – Martin Smith Feb 24 '12 at 19:49

2 Answers2

1

Try this construct instead (there may be better execution plans with other ways of saying it):

SELECT checkouts.email 
FROM orders, checkouts 
WHERE orders.id = checkouts.order_id AND orders.user_id is NULL 
AND orders.created_at < CURDATE() - INTERVAL 3 MONTH 
AND checkouts.email NOT IN (
    SELECT checkouts.email 
    FROM orders, checkouts 
    WHERE orders.id = checkouts.order_id AND orders.user_id is NULL 
    AND orders.created_at > CURDATE() - INTERVAL 3 MONTH
)

You may also want to add handling for checkouts.email IS NULL, since that WILL NOT be NOT IN (if that phrasing makes any sense), because NULL is never = or <> to anything.

SELECT checkouts.email 
FROM orders, checkouts 
WHERE orders.id = checkouts.order_id AND orders.user_id is NULL 
AND orders.created_at < CURDATE() - INTERVAL 3 MONTH 
AND (
    checkouts.email IS NULL
    OR
    checkouts.email NOT IN (
        SELECT checkouts.email 
        FROM orders, checkouts 
        WHERE orders.id = checkouts.order_id AND orders.user_id is NULL 
        AND orders.created_at > CURDATE() - INTERVAL 3 MONTH
    )
)

You might also consider more simply:

SELECT checkouts.email, MAX(orders.created_at)
FROM orders, checkouts 
WHERE orders.id = checkouts.order_id AND orders.user_id is NULL
GROUP BY checkouts.email 
HAVING MAX(orders.created_at) < CURDATE() - INTERVAL 3 MONTH 
Cade Roux
  • 88,164
  • 40
  • 182
  • 265
  • Your answer looks good, but the query is still running and hasn't finished. I will let you know if this works as expected. – Zyren Feb 24 '12 at 19:59
  • @Zyren I would try an `OUTER LEFT JOIN` to the subquery instead. It's possible that the execution plan is actually running that set for each row in the first set instead of joining the two sets. – Cade Roux Feb 24 '12 at 20:02
  • @Zyren Unfortunately, MySQL also doesn't support CTEs/subquery factoring, so you cannot use those techniques to make it more performant or readable either: http://stackoverflow.com/a/1382618/18255 – Cade Roux Feb 24 '12 at 20:04
  • @Zyren I added a much simpler construction which I would think would be pretty efficient. – Cade Roux Feb 24 '12 at 20:21
0

MySQL doesn't do date arithmetic... you have to use DATE_ADD() or DATE_SUB() such as

DATE_ADD( CURDATE(), INTERVAL 3 MONTH )
DATE_SUB( CURDATE(), INTERVAL 3 MONTH )

Also, dates can be a PITA. If the created_at is inclusive of time, you want to make sure you get the DATE ONLY portion so it implies 12:00:00 AM (which is ok with the CURDATE() function). However, similar with your ENDING Date... if your ending date is Feb 24, 2012 but want to include up to Feb 24, 2012 11:59:59pm, you would be best to do LESS than one DAY more.. ex: < Feb 25, 2012

DRapp
  • 47,638
  • 12
  • 72
  • 142