2

So, I have these two tables: tableA and tableB. Upon doing a simple inner join of these tables,

SELECT *
FROM tableA 
JOIN tableB
ON tableA.columnA = tableB.id

Now, tableA contains 29000+ rows, whereas tableB contains just 11000+ rows. tableB.id is a primary key, hence clustered. And there exists a non-clustered index on columnA.

According to my thinking, the query optimizer should treat tableB as the inner table while performing the join, because it has a lesser number of rows, and treat tableA as the outer table, as a lot of rows need to be filtered from tableA based on the value of the tableB.id column.

But, the exact opposite of this actually happens. For some reason, the query optimizer is treating tableA as the inner table and tableB as the outer table.

Can someone please explain why that happens and what error am I making in my thought process? Also, is there a way to forcefully supersede the decision of query optimizer and dictate it to treat tableB as inner table? I am just curious to see how do the two different executions of the same query compare to each other. Thanks.

CuriousLearner
  • 361
  • 1
  • 6
  • 14
  • 1
    Just a note, there is no "inner" table in an inner join, inner describes the type of join being made. – Uueerdo Dec 17 '18 at 20:19
  • 1
    You should take into account index statistic that server have for the data in these tables. In some situations server might decide not to use index at all if there is not sense to do this based on the statistic. – fifonik Dec 17 '18 at 20:55
  • Please act on [mcve] including giving DDL & EXPLAIN output. Read everything in the MySQL manual re indexes & optimization including [statistics](https://dev.mysql.com/doc/refman/8.0/en/optimizer-statistics.html). What happened when you ran your query both ways using STRAIGHT JOIN? PS The "error I am making in my thought process" is thinking that an uninformed guess is relevant. To start to inform yourself first Read The Fine Manual for every feature you are using. Then what did your googling with site:[so] & with site:[dba.se] tell you? See [ask] & the voting arrow mouseover texts. – philipxy Dec 18 '18 at 03:02
  • Possible duplicate of [Is there a way to force MySQL execution order?](https://stackoverflow.com/q/3456134/3404097) – philipxy Dec 18 '18 at 03:24

1 Answers1

3

In InnoDB, primary key index lookups are marginally more efficient than secondary index lookups. The optimizer is probably preferring to run the join that does lookups against tableB.id because it uses the primary key index.

If you want to override the optimizer's ability to reorder tables, you can use an optimizer hint. The tables will be accessed in the order you specify them in your query.

SELECT *
FROM tableA 
STRAIGHT_JOIN tableB
ON tableA.columnA = tableB.id

That syntax should work in any currently supported version of MySQL.

That will give you the opportunity to time query with either table order, and see which one in fact runs faster.

There's also new syntax in MySQL 8.0 to specify join order with greater control: https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html#optimizer-hints-join-order

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828