-4

Query:

SELECT * FROM TABLE1 TBL
INNER JOIN CROSS_REF_TABLE XREF
ON TBL.COL = XREF.COL

VS

SELECT * FROM TABLE1 TBL
LEFT JOIN CROSS_REF_TABLE XREF
ON TBL.COL = XREF.COL
WHERE XREF.COL IS NOT NULL

Can Left join be faster than inner join ? In my case for some reason I see the LEFT JOIN is faster.

NOTE: All the tables have random distribution.

When I look at the execution plan the difference between two is with the added distribution on cross reference table.

LEFT JOIN:

Node 32.
  [SPU Sequential Scan table "CROSS_REF_TABLE XREF" as "XREF" {}]
      -- Estimated Rows = 129044, Width = 18, Cost = 0.0 .. 0.2, Conf = 100.0
      Projections:
        1:XREF.ORG_SK  2:XREF.COL
  [SPU Distribute on {(XREF.COL)}]
  [HashIt for Join] 

INNER JOIN:

Node 3.
  [SPU Sequential Scan table "CROSS_REF_TABLE XREF" as "XREF" {}]
      -- Estimated Rows = 129044, Width = 18, Cost = 0.0 .. 0.4, Conf = 100.0
      Projections:
        1:XREF.ORG_SK  2:XREF.COL
  [HashIt for Join] 

Does Netezza LEFT JOIN do Nested Loop or Hash Join?

VLL
  • 9,634
  • 1
  • 29
  • 54
anayabu
  • 31
  • 1
  • 7
  • 1
    Surely the join operation it picks will be based on the estimated number of rows, not on the choice of left or inner join. – TZHX Oct 24 '16 at 08:33
  • 1
    Both queries in your question are identical and are using an outer join –  Oct 24 '16 at 08:53
  • As Inner & Left join return two different result the main question should be: *What result do I need?* – dnoeth Oct 24 '16 at 08:59
  • The question is not about what the result would be . We can very well achieve what we want want by adding additional conditions to where.c lause – anayabu Oct 24 '16 at 14:36
  • The question is similar to http://stackoverflow.com/questions/2726657/inner-join-vs-left-join-performance-in-sql-server – anayabu Oct 24 '16 at 14:37
  • That is answered for SQL server. Want to know how does it behave in netezza – anayabu Oct 24 '16 at 14:38

1 Answers1

0

Although this MAY be true in some cases, the statement in the header is extremely over-simplified, and indicates a causality where only a limited correlation exists.

Had you changed you topic to 'could it happen in real life that inner join and left join are not equal in speed even though they are equal in result', all answers would be 'yes'

There are literally hundreds of examples of that in SQL (= versus >= AND <=, exists versus 'in sub-select', like versus equal, like versus between, event_time=(sub-select of max(event_time) per key versus join on key to select key,max(event_time) etc, etc) but the results will differ from database to database, and even across software versions of the same database.

One thing that ALWAYS tends to even out these differences (and generally improve performance) is to update your statistics on all tables involved in the query, and limit the number of joins between the large table and the whereclauses that limits the final result set the most (avoid snowflake - stick to star)

Lars G Olsen
  • 1,093
  • 8
  • 11