2

I have two tables:

Table A:

Name, isPair

Table B:

Name1, Name2, Status

This is my query:

SELECT Name
FROM A
LEFT JOIN B ON (A.Name = B.Name2)
WHERE A.isPair = 'T' AND (B.status <> 'valid' OR B.status IS NULL)

I have millions of rows in both tables. At the current queries speed it will take over 3 months to complete. I have indexed both tables appropriately. When I originally did an INNER JOIN it only took 10 minutes to complete, but I discovered the query wasn't returning rows that were not in Table B's Name2 column, which was a problem as I need them returned.

Nick.Mc
  • 18,304
  • 6
  • 61
  • 91
Mitchell
  • 169
  • 1
  • 14
  • 1
    You're joining on a name, which I imagine will be either a VARCHAR or a CHAR column. See http://stackoverflow.com/questions/332300/is-there-a-real-performance-difference-between-int-and-varchar-primary-keys – rurouni88 Oct 27 '14 at 04:09
  • In addition to @rurouni88 ... what do you mean by "indexed both tables appropriately"? – TeamTam Oct 27 '14 at 04:34
  • @rurouni88 thanks but that's not the major cause of the slowness. I've multi joined both tables with other tables before and queries have taken under 10 minutes. – Mitchell Oct 27 '14 at 04:37
  • @TeamTam I have indexed the tables. Name is unique. – Mitchell Oct 27 '14 at 04:37
  • It would help to compare query plans. If you only want to display columns from `A` (and none from `B`) then you could try using `EXISTS` instead of outer joining to the table, though usually the query planner is smart enough to do that for you behind the scenes. Can you indicate whether the actual underlying database is SQL Server or MySQL? – Nick.Mc Oct 27 '14 at 04:51
  • @Nick.McDermaid MySQL with InnoDB engine – Mitchell Oct 27 '14 at 04:58
  • @Nick.McDermaid And also that's not quite what I'm trying to do. I want to return all Name's from Table A which isPair='T' and does not appear in Table B with its status equal to 'valid'. – Mitchell Oct 27 '14 at 05:06
  • If you only want to display Names from table and A and do not want to display (filter, yes) anything from Table B it's a candidate to rewrites using `EXISTS`. But long before we try that I suggest providing some kind of comparative query plan. between the INNER and OUTER join. I've also added those two tags for you. – Nick.Mc Oct 27 '14 at 05:41
  • @Nick.McDermaid Okay, that makes sense. I'm a bit confused by comparative query plan. Could you elaborate further? My old query literally just replaced LEFT JOIN with INNER JOIN. – Mitchell Oct 27 '14 at 05:55
  • When analysing performance the first thing you look at is the query plan, i.e. how the database is going to use indexes, joins etc. . If you have a fast query and a slow query you compare query plans between the two. Looking at this link http://stackoverflow.com/questions/16304875/not-efficient-execution-plan-taken-by-mysql-innodb it seems if you prefix the query with `EXPLAIN` it will show the query plan. If you compare the two plans you might find one isn't using indexes or you might find they are identical. It's the first step to performance optimisation. – Nick.Mc Oct 27 '14 at 06:19
  • I started writing a `NOT EXISTS` query for you... but I don't even know if InnoDB has those keywords. – Nick.Mc Oct 27 '14 at 06:22
  • @Nick.McDermaid I would very much appreciate if you posted that. InnoDB does have NOT EXISTS. :) – Mitchell Oct 27 '14 at 06:23

1 Answers1

3

This query might return the correct results faster, or it might return incorrect results with no speed improvement

This is all based on SQL Server knowledge but I assume InnoDB has the same characteristics.

SELECT Name 
FROM A
WHERE A.isPair = 'T'
AND NOT EXISTS (
    SELECT 1 FROM B 
    WHERE A.Name = B.Name2
    AND B.status = 'valid'
    )

I hope I have rearranged the Boolean logic correctly.

Before you were searching for records in A that had no match in B or a match with status <> valid

The new query returns records from A where it can't find a match in B with status = valid. Hopefully that is the same thing.

There are two database concepts I am using here:

  1. When using EXISTS it can just see if the table record exists, it doesn't have to join to the table and retrieve values from it..... most query planners do this automatically though so this is a long shot

  2. The operator <> is non sargable which means it can't utilise any index which contains the column status... i.e. it can't explicitly seek an the index for the absence of a value, it can only search an index for specific value(s). So I've changed it to = for this reason and also because it supports the NOT EXISTS logic

Again I don't know much about InnoDB but I'm sure if it didn't have these limitations the tricks would have been copied by Oracle and Microsoft already.

Comparing query plans will give you some idea of whether this rewrite makes any difference. ALso comparing query plans between the existing INNER and OUTER versions of your query might shed some light on things.

Nick.Mc
  • 18,304
  • 6
  • 61
  • 91
  • I've heard it more than once that for some reasons anti-joins in MySQL work more efficiently when implemented using `LEFT JOIN ... WHERE ... IS NULL` rather than using `NOT EXISTS`. Therefore, it might make sense to try another alternative: `... LEFT JOIN B ON `A.Name = B.Name2 AND B.status = 'valid' WHERE B.Name2 IS NULL`. I'm not a MySQL expert, so just commenting (but feel free to add this version to your answer if you think it worthwhile). – Andriy M Nov 05 '14 at 10:20