-3

I have three tables that I need to JOIN to get values from two columns. These columns are GRN_STATUS and STATUS I have written some SQL that achives the desired result but I've been advised that using INis very inefficient and that I should use EXISTS instead.

I'm just wondering is this true in my situation? and what would a solution using EXISTS instead of IN look like?

SQL:

SELECT c.GRN_STATUS, a.STATUS
FROM
    TableA a
        INNER JOIN
    TableB b
        ON a.ORD_NO = b.ORD_NO
        AND a.COMPANY_ID = b.COMPANY_ID
        INNER JOIN
    TableC c
        ON b.GRN_NO = c.GRN_NO
        AND b.COMPANY_ID = c.COMPANY_ID
AND a.STATUS IN ( 'B', 'C', 'D', 'E' )
AND c.GRN_STATUS = 'A';
JTK
  • 1,469
  • 2
  • 22
  • 39
  • Depending on the size of the list to test. In your case, `IN` and `EXSITS` should have the same performance. – Zohar Peled Nov 17 '16 at 12:21
  • 1
    I think the answer to the performance part of your question might be database specific. That being said, what database are you using? – Tim Biegeleisen Nov 17 '16 at 12:21
  • Assuming the list could get very large, would `EXSITS` preform better? – JTK Nov 17 '16 at 12:22
  • Depends on the dbms optimizer. Tag the dbms! – jarlh Nov 17 '16 at 12:24
  • @msanz, if SQL Server is used... – jarlh Nov 17 '16 at 12:24
  • 4
    You have a list of constants. `EXISTS` doesn't make sense. You should also qualify all column names in the query so we know where the columns are coming from. – Gordon Linoff Nov 17 '16 at 12:27
  • The DB could be Oracle or SQL Server. – JTK Nov 17 '16 at 12:27
  • @Gordon Linoff I thought it might not make sense to use EXISTS in my situation, are sure of that? – JTK Nov 17 '16 at 12:30
  • Probably not the best example to compare EXISTS versus IN. If it was something like `AND STATUS IN (SELECT code FROM statuscodes)` instead of a hard coded list it could matter. Depending on the RDBMS version, query optimizer, indexes, etc. – LukStorms Nov 17 '16 at 12:37

1 Answers1

1

In general, it depends on the implementation in the DBMS.

EXISTS mostly stops and returns at the first match so it COULD be more efficient, but it makes no sense when you have a list of constants.

Since SQL is a declarative language, you can't tell the DBMS the how, just the what. You describe the expected result and it is up to the server to try to find the most efficient way to fulfill your request.

The way the DBMS finds the efficient algorithm is based on several things including the amount and the distribution of the data, the actual statistics, the expected resources needed, etc.

So EXISTS may perform better on a huge table, while has no effect on smaller ones (or vica versa).

Your best bet is to actually check the estimated query plans or try them out.

My personal view is to use EXISTS when no data is required and JOIN when data is required. IN is for constant lists.

Pred
  • 8,789
  • 3
  • 26
  • 46
  • I wonder what I've missed to get a downwote on this. Not because of the points on the site, but just pure curiosity. – Pred Nov 17 '16 at 13:11
  • Down-voted (sorry for the delay with explanation). IN based on sub-query and its equivalent EXIST has exactly the same logical meaning therefore can be implemented exactly the same. Claiming that there are performance differences without at least mentioning vendor, version and **execution plan** is unprofessional. P.s. In different databases you can tell the database the how or how not, or at least imply. – David דודו Markovitz Nov 17 '16 at 15:28
  • P.s. I urge you to think what is the meaning of "stops and returns at the first match" for hash join and merge join. – David דודו Markovitz Nov 17 '16 at 15:29
  • @DuduMarkovitz: IN and EXISTS are not equivalent. You can achieve the same result, but saying that they are equivalent is not true. Consider this: you want to check if there is a record which has the same values in two or more columns in another table. How can you achieve this with IN? The execution plan is generated based on statistics and other measures so what kind of EP do you want to include here for a general question without any data? It might be different based on the data. I agree on the "stops and returns" part it is poorly worded. And about the vendor thing: read the first sentence. – Pred Nov 17 '16 at 15:49
  • (1) I didn't say they are equivalent. I said "IN based on sub-query **and its equivalent** EXIST" http://stackoverflow.com/a/39918818/6336479 (2) Some database supports multi-column IN. (3) I expect to hear that claimer at least compared execution plans (4) As I were saying. If you want to claim somthing claim it on a specific vendor and version – David דודו Markovitz Nov 17 '16 at 15:57
  • As you wish: equivalent IN and EXISTS from the vendor's pen: https://dev.mysql.com/doc/refman/5.5/en/subquery-optimization-with-exists.html | An article from a more professional person and using another vendor's product: http://blog.sqlauthority.com/2010/06/05/sql-server-convert-in-to-exists-performance-talk/ – Pred Nov 17 '16 at 16:05
  • This is exactly what I'm talkin about. Article from 2010 about SQL SERVER 2008. Is it valid for SQL SERVER 2012? 2014? 2016? and although the topic is **"Convert IN to EXISTS"** , the comparison is **not between EXISTS and IN but EXISTS and =**, which are logically different since = requires the inner query to return a single value or the query fails. And off course there are no execution metrics only estimations of the execution plan. – David דודו Markovitz Nov 17 '16 at 18:33
  • Yes, mostly EXISTS performs better in my experience when the goal is to check if there is correlation or even lack of correlation between two tables (talking about 2014 and 2016 SQL Server), however it depends on the actual data and query. I am aware that DBMSes are evolving and as I mentioned in the original answer, check the execution plan. Always. I was talking in general, and it still stands in my opinion, maybe the wording is not the best, I agree with this. You know, I know, everybody knows that performance questions in the database world mostly standing only for the actual query. – Pred Nov 17 '16 at 18:46
  • So we have a whole answer on something that "mostly standing only for the actual query" for a post that is not even tagged with a DB type and version. That how is goes, everybody knows. – David דודו Markovitz Nov 17 '16 at 19:13
  • Yes, we have a generic answer for a generic question. Not scope of this site? You have access to the close. – Pred Nov 17 '16 at 19:56