0

I have a query sub query below (Purpose search top 2 orders from order table whose freight_charges=2 and get the customers of those orders)

  1. Working query with IN clause
    SELECT *
    FROM Customers C
    WHERE C.CUST_ID IN (
      SELECT TOP 2 CUST_ID
      FROM Orders O
      where FREIGHT_CHARGES = 2
    ) 

However I want to convert the IN clause into a more efficient EXISTS clause

  1. Code not working with exists clause
    SELECT *
    FROM Customers C
    WHERE EXISTS (
        SELECT TOP 2 CUST_ID
        FROM Orders O
        where FREIGHT_CHARGES = 2 AND C.CUST_ID = O.CUST_ID
    )

In case of the second query I am not retrieving the top 2 cust_id BUT all the records.

Please let me know any implementation to get the desired result.

Edited: Using INNER JOIN as suggested in the answers . I am able to get the correct result. However since I don't want to retrieve any record from Orders table I thought exists would be a better approach performance wise.

    SELECT C.*
    FROM Customers C
    INNER JOIN (
      SELECT TOP 2 CUST_ID
      FROM Orders 
      where FREIGHT_CHARGES = 2
    ) O ON C.CUST_ID=O.CUST_ID
  • Aside from your problem, a `TOP X` query doesn't make any sense without an `ORDER BY` clause. Otherwise, you're just selecting X rows at random (in a sense). – TT. Feb 20 '20 at 06:26
  • I might add the order by clause however isn't it by default ordering with the primary key? As said I am getting the desired result using the 1st query but if I need to use the EXISTS clause I am getting the issue. Read through the below link as well but could not get the answer https://stackoverflow.com/questions/34759173/any-point-in-using-limit-in-exists-query – Sangbaran Banerjee Feb 20 '20 at 06:48
  • Without an order by clause, ordering is arbitrary (and so, might or might not order by using the primary key depending on what the execution plan dictates is most efficient). This is so by definition. No order by clause => arbitrary ordering. – TT. Feb 20 '20 at 07:03
  • @TT. Thanks for this suggestion I will add the order by clause as well. – Sangbaran Banerjee Feb 20 '20 at 07:46
  • Who said that `EXISTS` is more efficient than `IN`? If both expressions are logically equivalent (not your case!), then SQL server most often produces identical query plans anyway. – Arvo Feb 20 '20 at 07:50
  • @Arvo . I am just going through contents over the internet and got to know that exists and joins always have better performance than a IN clause. Thanks. – Sangbaran Banerjee Feb 20 '20 at 09:29
  • Like I said, when queries are logically identical, then performace is usually same. Do not believe everything on internetz :) – Arvo Feb 20 '20 at 09:40

4 Answers4

0

If you mix EXISTS with TOP there no difference than you use it without TOP, because EXISTS "Returns TRUE if a subquery contains any rows".

In your case, you can check queries:

SELECT *
FROM Customers C
  WHERE EXISTS (
    SELECT 1
    FROM Orders O
    JOIN Orders O2
      ON O.FREIGHT_CHARGES = O2.FREIGHT_CHARGES 
      AND O.CUST_ID = O.CUST_ID
      AND O.ID <> O2.ID
    where O.FREIGHT_CHARGES = 2 AND C.CUST_ID = O.CUST_ID
   )

Or

SELECT *
FROM Customers C
WHERE EXISTS (
    SELECT 1
    FROM Orders O
    where FREIGHT_CHARGES = 2 AND C.CUST_ID = O.CUST_ID
    GROUP BY FREIGHT_CHARGES, CUST_ID
    HAVING COUNT(1) > 1
)

You must test which one is better in your case

Leszek Mazur
  • 2,443
  • 1
  • 14
  • 28
0

I can suggest you should go for INNER JOIN. It is always a good practice to avoid subqueries if possible as it gets executed against each row from master table because it in WHERE condition. Rather, you can use simple INNER JOIN to get the desired output.

Like in your example, you just need TOP 2 customers having the FREIGHT_CHARGES value to 2. You can use below query using INNER JOIN.

SELECT
    TOP 2
    C.*
FROM Customers C
INNER JOIN Orders O ON C.CUST_ID = O.CUST_ID
WHERE O.FREIGHT_CHARGES = 2
ORDER BY C.CUST_ID -- Use 'ORDER BY' clause if 'C.CUST_ID' column does not have clustered index or you can use any other column

Please let me know your feedback.

iVad
  • 563
  • 2
  • 4
  • 13
  • The query you posted would not work if I have same 1 customer with more than 1 orders in order table. However I will take the inner join suggestion you gave. I was a bit confused with the performance of inner join and exists and was not sure which one to use. – Sangbaran Banerjee Feb 20 '20 at 07:39
0

The EXISTS operator is a logical operator that allows you to check whether a subquery returns any row. The EXISTS operator returns TRUE if the subquery returns at least one or more row.

So, since every row in Order table with FREIGHT_CHARGES = 2 will return TRUE for EXISTS statement (does not matter if it is TOP 2, 1 would be enough) the result includes every records in Customers table with proper CUST_ID's in Orders.

In your case, EXISTS won't be a good choice for you.

Please check this link: https://www.sqlservertutorial.net/sql-server-basics/sql-server-exists/

Mehmet
  • 1,435
  • 4
  • 13
  • 15
0
   SELECT Top 2 *
    FROM Customers as C
    WHERE EXISTS (
        SELECT O.CUST_ID
        FROM Orders as O
        where FREIGHT_CHARGES = 2 AND C.CUST_ID = O.CUST_ID
    )
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – borchvm Feb 21 '20 at 07:45