4

This is a bit of a weird question, so the best way to ask it is with an example. I have a list of customers. I want to get any customer who has a corresponding entry in either the CourseHistory table or the Access table (or both).

I want an optimal single query (no subqueries) that fetches these customers. I came up with

SELECT
   c.cusid
FROM
   Customers c
   CROSS JOIN Realms r
   LEFT JOIN Course.CourseHistory ch ON (c.cusid = ch.cusid)
   LEFT JOIN Access a ON (c.cusid = a.cusid AND r.realmid = a.realmid)
WHERE
   realmname = 'Course'
   AND COALESCE(chid, accid)

This works but it is noticeably slow, probably because it has to do a full scan of Customers. Since either CourseHistory or Access can be null and the result still be valid, they have to be left joined. Is there a more correct way to do this query?

Explosion Pills
  • 188,624
  • 52
  • 326
  • 405

4 Answers4

4

Get rid of that CROSS JOIN to Realms and INNER JOIN that table to Access instead.

SELECT
   c.cusid
FROM
   Customers c
   LEFT JOIN Course.CourseHistory ch ON (c.cusid = ch.cusid)
   LEFT JOIN Access a 
       INNER JOIN realms r
           ON a.realmid = r.realmid
               AND r.realmname = 'Course'
       ON c.cusid = a.cusid
WHERE
   COALESCE(chid, accid)
Joe Stefanelli
  • 132,803
  • 19
  • 237
  • 235
  • 1
    This returns different results. I also thought it was incorrect to have an inner join after an outer join in the list of joins. It is faster, though. – Explosion Pills Nov 23 '11 at 21:04
  • Specifically this will only return results that have an `Access` ID because the LEFT JOIN on `Access` is nullified by the inner join on Realm (which can't return a null row). – Explosion Pills Nov 23 '11 at 21:10
  • @tandu I think you're misunderstanding this query. Customers is being left joined to the derived table formed by the inner join between Access and Realms. This does not negate the left join. – Joe Stefanelli Nov 23 '11 at 21:25
  • I ran his query manually and it does have fewer results. That I know. My second response is mostly postulation. – Explosion Pills Nov 23 '11 at 21:52
  • Hey Joe, your answer showed me the correct relationships to use in my answer. +1 to avoid copyright infringement. – RolandoMySQLDBA Feb 15 '12 at 18:50
3

Here is your original query

SELECT  
   c.cusid  
FROM  
   Customers c  
   CROSS JOIN Realms r  
   LEFT JOIN Course.CourseHistory ch ON (c.cusid = ch.cusid)  
   LEFT JOIN Access a ON (c.cusid = a.cusid AND r.realmid = a.realmid)  
WHERE  
   realmname = 'Course'  
   AND COALESCE(chid, accid)  
; 

From you comments, I realize this now

  • Realm can reach Access
  • Access reach customer, but you will not need to
  • Access can reach Course.CourseHistory via cusid

Given this path, here is the refactored query

SELECT r.cusid
FROM
(SELECT realmid FROM Realms WHERE realmname = 'Course') r
LEFT JOIN
(SELECT realmid,cusid,accid FROM Access) a ON r.realmid=a.realmid
LEFT JOIN
(SELECT cusid FROM Course.CourseHistory) ch ON a.cusid=ch.cusid
WHERE COALESCE(chid, accid);

You will need the following indexes

ALTER TABLE Realms ADD INDEX realmname_realmid_ndx (realmname,realmid);
ALTER TABLE Access ADD INDEX realmid_cusid_accid_ndx (realmid,cusid,accid);

Give it a Try !!!

RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
  • Realms does not have a cusid column. – Explosion Pills Feb 15 '12 at 18:28
  • @RolandoMySQLDBA, excuse me, but I want to know: do you really mean `(SELECT realmid,cusid,accid FROM Access) a` when you suggest it, that is select all ids from the table into unindexed space in memory, and apply a condition after that by checking each row? – newtover Feb 15 '12 at 19:28
  • @newtover Yes I do. That index will make the query fetch the needed data from the index only and never touch the Access table. It servers as a covering index in this respect. – RolandoMySQLDBA Feb 15 '12 at 19:30
  • 1
    Each subquery was made as small as possible. Cartesian JOINs of keys should be faster that Cartesian JOINs of whole tables. Making an index to support a subquery is the key in this instance. I have done this before : http://stackoverflow.com/a/6023217/491757 – RolandoMySQLDBA Feb 15 '12 at 19:34
1

Your current query starts with a Cartesian product of Customers x Realms and should return too many duplicates. There also should be a lot of duplicates that come from joins on Course.CourseHistory and Access if they have more than a single record for each customer.

The following query should be more effective. The UNION in the query applies implicit DISTINCT, that is all ids are unique. Moreover, the query allows MySQL optimizer to benefit from table statistics and arrange joins in an optimal order.

SELECT cuid
FROM Customers c
JOIN Course.CourseHistory ch USING (cuid)
UNION
SELECT cuid
FROM Realms r
JOIN Access a USING (realmid)
JOIN Customers c USING (cuid)
WHERE r.realmname = 'Course';
newtover
  • 31,286
  • 11
  • 84
  • 89
  • `CROSS JOIN`s in MySQL do not require explicit join conditions. If they do, at least MySQL never complains about it. Doesn't the `UNION` count as two separate queries? The cartesian product is in fact the goal..ordinarily it would be limited by the keys of the other tables but since they can both be NULL it's not. – Explosion Pills Feb 15 '12 at 17:39
  • @tandu, it does require an explicit condition. It is stated in docs (http://dev.mysql.com/doc/refman/5.1/en/join.html) and easy to check, and, thus, does not produce warnings. I do not understand your point about counting queries, how do you count them and how do you measure what is better? If the query returns correct results and works fast enough, is not it what you need? – newtover Feb 15 '12 at 17:49
  • I don't see anything in the docs that say `CROSS JOIN` requires an explicit join condition. In actuality, neither `CROSS JOIN` nor `INNER JOIN` require it. – Explosion Pills Feb 15 '12 at 17:53
  • @tandu, your comment about Cartesian product makes no sence. If it is intended, the same result (though with less duplicates) can be achieved with complete removal of the `Realms` table from the query. – newtover Feb 15 '12 at 17:54
  • @tandu, ok, I meant that you hardly ever need a Cartesian product. At least, this case does not seem to need it. – newtover Feb 15 '12 at 17:58
  • Well then if I wasn't going to use a `UNION`, what would you suggest? – Explosion Pills Feb 15 '12 at 18:05
  • @tandu, what is wrong with a `UNION`? I actually do not understand your fears =) – newtover Feb 15 '12 at 18:06
  • I don't think anything is *wrong* with union, and it does appear to produce the correct results, I'd just like to see this done in a single query. – Explosion Pills Feb 15 '12 at 18:27
0

You seem to want to gather all of the customer IDs from CourseHistory and Access (within the "Course" realm) and retrieve the customers that match those IDs. Here's a query that does exactly that.

SELECT
    c.cusid
FROM
    Customers c
    INNER JOIN (
        SELECT ch.cusid FROM Course.CourseHistory AS ch
        UNION
        SELECT a.cusid FROM Access a
        INNER JOIN Realms r ON r.realmid = a.realmid AND r.realmname = 'Course'
    ) AS ids ON c.cusid = ids.cusid

If you really only need the customer ID, and the customer table is guaranteed to contain all existing customer IDs, then you can dispense with the outer select and just use the inner UNION.

If you really need a single query, then you'll have to settle for something worse. This at least is an improvement over your original query:

SELECT
    c.cusid
FROM
    Customers c
    LEFT JOIN Course.CourseHistory ch ON (c.cusid = ch.cusid)
    LEFT JOIN Access a ON (c.cusid = a.cusid)
    LEFT JOIN Realms r ON (r.realmid = a.realmid AND realmname = 'Course')
WHERE
    AND chid IS NOT NULL OR realmname IS NOT NULL
Brilliand
  • 13,404
  • 6
  • 46
  • 58