Query where two columns are in the result of nested query

Question

I'm writing a query like this:

select * from myTable where X in (select X from Y) and XX in (select X from Y)

Values from columns X and XX has to be in the result of the same query: select X from Y.

I think that this query is invoked twice so its senseless. Is there any other option I can write this query more efficiently? Maybe temp table?

RichardTheKiwi · Answer 1 · 2011-01-26T09:50:51.740

4

Actually no, there isn't a smarter way to write this (without visiting Y twice) given the X that myTable.X and myTable.YY matches to may not be from the same row.

As an alternative, the EXISTS form of the query is

select *
from myTable A
where exists (select * from Y where A.X = Y.X)
  and exists (select * from Y where A.XX = Y.X)

If Y contains X values of 1,2,3,4,5, and x.x = 2 and x.xx = 4, they both exist (on different records in Y) and the record from myTable should be shown in output.

EDIT: This answer previously stated that You could rewrite this using _EXISTS_ clauses which will work faster than _IN_. AS Martin has pointed out, this is not true (certainly not for SQL Server 2005 and above). See links

edited Jan 26 '11 at 09:50

answered Jan 26 '11 at 01:29

RichardTheKiwi

105,798
26
196
262

1

+1, but it won't necessarily be faster with exists - my simple test gave the exact same query plan for both approaches. The query optimizer is pretty smart, it always pays to check the plans when "optimizing" your sql. – Blorgbeard Jan 26 '11 at 01:39
+1 I had to try a few others offline before agreeing but this is the most efficient... – Tahbaza Jan 26 '11 at 02:06
everyone have had bad days, don't take it to heart :-). I'll upvote to neutralize this, because your answer is clearly correct (though my answer is better regarding original question :-) – Alexander Malakhov Jan 26 '11 at 04:52
1

In my error, I downvoted. Alex courteously reminded me that my understanding was incorrect -- you called me an idiot (direct quote, you've since edited the comment after my threat of alerting the mods), and told me to delete it immediately. Now you feel the need to plaster my name to demonstrate the misunderstanding, and demand a reversal. Yeah buddy, I'll get right on that. While we're at it, what about [reversing the downvote on my answer earlier](http://stackoverflow.com/questions/4796778/conversion-of-legacy-outer-join-to-ansi)? If you don't like it, don't do it to others. – OMG Ponies Jan 26 '11 at 05:00
@Alexander Malakhov: It's bigger than this question - he's been hounding me for days now. I used to be flattered... If you'd seen his behavior (calling an ANSI-89 join a CROSS JOIN/cartesian product, etc), I think you'd have hesitated to upvote. – OMG Ponies Jan 26 '11 at 05:01
@omg "threat"? Yes you've been "threat"-happy all day. mods can follow all questions where we've crossed and get a better picture than one link you post. You've been pushing my buttons all day. Did I do anything wrong here http://stackoverflow.com/questions/4798698/ to elicit your informed comments? – RichardTheKiwi Jan 26 '11 at 05:18
They'd also see what came first (that'd be you). Secondly, my comments stand as they are. Note that there's no name calling, no ad hominem attack(s) against you. Nothing substantiates your actions. Also, the fact you're not arguing your downvote against me earlier gives weight that the downvote was cast by you. – OMG Ponies Jan 26 '11 at 05:28
1

`in` and `exists` do [get treated the same in SQL Server](http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/) AFAIK. – Martin Smith Jan 26 '11 at 07:58
@omg - I was actually referring to your uninvited comments against my answer not comments against the question itself. All three of them that added no value to the question. I invited you to inspect my answers against many other questions to support your baseless, unnecessary, very personal accusation: I quote less you delete it `"Why does it take threat of public shaming for you to give a real answer?"` Haven't heard back yet – RichardTheKiwi Jan 26 '11 at 09:17
You posted a vague answer that didn't provide anything beyond what had already been mentioned in the comments against the question. Not sure what "uninvited" has to do with the fact that every answer can be commented against for those with 50+ rep. Anyone is free to scrutinize your answer, get over yourself already. – OMG Ponies Jan 26 '11 at 15:20

Thomas · Answer 2 · 2011-01-26T16:16:35.560

It will probably not be particularly efficient to try to write this query by only referencing Y once. However, given that you are using SQL Server 2008, there are variations that can be used:

Select ...
From MyTable As T
Where Exists    (
                Select 1
                From Y
                Where Y.X = T.X
                Intersect
                Select 1
                From Y
                Where Y.X = T.XX
                )

Addition

Actually, I can think of a way you could do it without using Y more than once (Nothing was said about using MyTable more than once). However, this is more for academic reasons as I think that using my first solution will likely perform better:

Select ...
From MyTable As T
Where Exists    (
                Select 1
                From Y
                Where Exists( 
                            Select 1 
                            From MyTable1 As T1 
                            Where T1.X = Y.X 
                            Intersect
                            Select 1 
                            From MyTable1 As T2 
                            Where T2.XX = Y.X
                            )
                    And Y.X In(T.X, T.XX)
                )

Alexander Malakhov · Answer 3 · 2011-01-26T03:29:20.380

0

WITH
   w_tmp AS(
      SELECT x
        FROM y
   )
SELECT * 
  FROM myTable
 WHERE x IN (SELECT x FROM w_tmp)
   AND xx IN (SELECT x FROM w_tmp)

(I've read this in Oracle docs, but I think MS able to do this optimizations too)
This way optimizer knows for sure that you are doing same query and can create temporary table to cash results (But it's still up to optimizer to decide whether it's worth it. For tiny queries, overhead of creating temp table can be too high).

Also (and actually this is way more important for me), when subquery is 50 lines, it's easier for human to see, that the same thing is used in both cases. Pretty much like factoring long functions into subroutines

Docs on MSDN

edited Jan 26 '11 at 03:29

answered Jan 26 '11 at 02:49

Alexander Malakhov

3,383
2
33
58

it may surprise you but the CTE w_tmp is `evaluated twice`, yes it is. Check the query plan. It is not stored in some temp memory. – RichardTheKiwi Jan 26 '11 at 02:51
@cyberkiwi: It's up to optimizer to decide whether it's worth it. For tiny queries, overhead of creating temp table can be too high. Thanks for comment, I'll update post to clarify this. – Alexander Malakhov Jan 26 '11 at 03:22
Oracle has hints to materialize a CTE (Oracle calls it "subquery factoring"), but otherwise there's no performance benefit to using a CTE. – OMG Ponies Jan 26 '11 at 04:02
@OMG Ponies: it can do it without hint. See Tom Kyte's 2nd answer at http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:128693000346554280. Or example in this article http://www.oracle-developer.net/display.php?id=212 – Alexander Malakhov Jan 26 '11 at 04:46
Tom didn't clarify versions - see: http://jonathanlewis.wordpress.com/2007/07/26/subquery-factoring-2/ – OMG Ponies Jan 26 '11 at 05:05

Schultz9999 · Answer 4 · 2011-01-26T08:01:57.990

0

Not sure what the problem is but isn't simple JOIN an answer?

SELECT t.* 
FROM myTable 
JOIN Y y1 ON y1.X = myTable.X
JOIN Y y2 ON y2.X = myTable.XX

or

SELECT t.*
FROM myTable, Y y1, Y y2
WHERE y1.X = myTable.X AND y2.X = myTable.XX

ADDED: if there is a strong need to eliminate a second query for Y, let's reverse the logic:

;WITH A(X)
AS (
  -- this will select all values that can be found in Y and myTable X and XX fields.
  SELECT Y.X -- if there are a lot of dups, add DISTINCT
  FROM Y, myTable
  WHERE Y.X IN (myTable.X, myTableXX)
)
-- now join back to the orignal table and filter.
SELECT t.* 
FROM myTable
-- similar to what has been mentioned before 
WHERE EXISTS(SELECT TOP 1 * from A where A.X = myTable.X)
      AND EXISTS(SELECT TOP 1 * from A where A.X = myTable.XX)

If you don't like WITH, you may use SELECT INTO clause and create in-memory table.

edited Jan 26 '11 at 08:01

answered Jan 26 '11 at 05:59

Schultz9999

8,717
8
48
87

The problem is: exactly same query `select X from Y` is executed twice – Alexander Malakhov Jan 26 '11 at 07:15
You don't trust SQL Server caching and optimizing strategies? – Schultz9999 Jan 26 '11 at 07:27
Added another way. Weird one. – Schultz9999 Jan 26 '11 at 07:54

Query where two columns are in the result of nested query

4 Answers4

Linked