14

Problem

I need to better understand the rules about when I can reference an outer table in a subquery and when (and why) that is an inappropriate request. I've discovered a duplication in an Oracle SQL query I'm trying to refactor but I'm running into issues when I try and turn my referenced table into a grouped subQuery.

The following statement works appropriately:

SELECT  t1.*  
FROM    table1 t1, 
INNER JOIN table2 t2 
        on t1.id = t2.id        
        and t2.date = (SELECT max(date) 
                       FROM   table2  
                       WHERE  id = t1.id) --This subquery has access to t1

Unfortunately table2 sometimes has duplicate records so I need to aggregate t2 first before I join it to t1. However when I try and wrap it in a subquery to accomplish this operation, suddenly the SQL engine can't recognize the outer table any longer.

SELECT  t1.* 
FROM    table1 t1, 
INNER JOIN (SELECT * 
            FROM  table2 t2
            WHERE t1.id = t2.id              --This loses access to t1
              and t2.date = (SELECT max(date) 
                             FROM   table2 
                             WHERE  id = t1.id)) sub on t1.id = sub.id 
                             --Subquery loses access to t1

I know these are fundamentally different queries I'm asking the compiler to put together but I'm not seeing why the one would work but not the other.

I know I can duplicate the table references in my subquery and effectively detach my subquery from the outer table but that seems like a really ugly way of accomplishing this task (what with all the duplication of code and processing).

Helpful References

  • I found this fantastic description of the order in which clauses are executed in SQL Server: (INNER JOIN ON vs WHERE clause). I'm using Oracle but I would think that this would be standard across the board. There is a clear order to clause evaluation (with FROM being first) so I would think that any clause occuring further down the list would have access to all information previously processed. I can only assume my 2nd query somehow changes that ordering so that my subquery is being evaluated too early?

  • In addition, I found a similar question asked (Referencing outer query's tables in a subquery ) but while the input was good they never really explained why he couldn't do what he is doing and just gave alternative solutions to his problem. I've tried their alternate solutions but it's causing me other issues. Namely, that subquery with the date reference is fundamental to the entire operation so I can't get rid of it.

Questions

  • I want to understand what I've done here... Why can my initial subquery see the outer table but not after I wrap the entire statement in a subquery?

  • That said, if what I'm trying to do can't be done, what is the best way of refactoring the first query to eliminate the duplication? Should I reference table1 twice (with all the duplication that requires)? Or is there (probably) a better way of tackling this problem?

Thanks in advance!

------EDIT------

As some have surmised these queries above are not the actually query I'm refactoring but an example of the problem I'm running into. The query I'm working with is a lot more complicated so I'm hesitant to post it here as I'm afraid it will get people off track.

------UPDATE------

So I ran this by a fellow developer and he had one possible explanation for why my subquery is losing access to t1. Because I'm wrapping this subquery in a parenthesis, he thinks that this subquery is being evaluated before my table t1 is being evaluated. This would definitely explain the 'ORA-00904: "t1"."id": invalid identifier' error I've been receiving. It would also suggest that like arithmetic order of operations, that adding parens to a statement gives it priority within certain clause evaluations. I would still love for an expert to weigh in if they agree/disagree that is a logical explanation for what I'm seeing here.

Community
  • 1
  • 1
  • 1
    A derived table can't be correlated. It must stand alone. Though you can join onto it. you can use `APPLY` where you need a sort of correlated derived table. – Martin Smith Nov 21 '13 at 14:04
  • Martin.... sincerely, thank you! Based on your comment I was able to do a little additional research and found that my query isn't actually pulling data how I originally thought. Also your advice to use Apply seems very applicable (though for me syntax is a little different as I'm using Oracle). Very much appreciate the advice - in fact if you submit it as an answer I'll mark it as the correct one. –  Nov 21 '13 at 14:37

3 Answers3

6

So I figured this out based on the comment that Martin Smith made above (THANKS MARTIN!) and I wanted to make sure I shared my discovery for anyone else who trips across this issue.

Technical Considerations

Firstly, it would certainly help if I used the proper terminology to describe my problem: My first statement above uses a correlated subquery:

This is actually a fairly inefficient way of pulling back data as it reruns the subquery for every line in the outer table. For this reason I'm going to look for ways of eliminating these type of subqueries in my code:

My second statement on the other hand was using what is called an inline view in Oracle also known as a derived table in SQL Server:

An inline view / derived table creates a temporary unnamed view at the beginning of your query and then treats it like another table until the operation is complete. Because the compiler needs to create a temporary view when it sees on of these subqueries on the FROM line, those subqueries must be entirely self-contained with no references outside the subquery.

Why what I was doing was stupid

What I was trying to do in that second table was essentially create a view based on an ambiguous reference to another table that was outside the knowledge of my statement. It would be like trying to reference a field in a table that you hadn't explicitly stated in your query.

Workaround

Lastly, it's worth noting that Martin suggested a fairly clever but ultimately inefficient way to accomplish what I was trying to do. The Apply statement is a proprietary SQL Server function but it allows you to talk to objects outside of your derived table:

Likewise this functionality is available in Oracle through different syntax:

Ultimately I'm going to re-evaluate my entire approach to this query which means I'll have to rebuild it from scratch (believe it or not I didn't create this monstrocity originally - I swear!). A big thanks to everyone who commented - this was definitely stumping me but all of the input helped put me on the right track!

Community
  • 1
  • 1
  • +1 Thinking of them as inline views explains the problem well. [More recent versions of Oracle do support APPLY though](http://technology.amis.nl/2013/06/26/oracle-database-12c-joining-and-outer-joining-with-collections/) – Martin Smith Nov 22 '13 at 01:16
  • Thanks for the info... I'm still using 11G so that functionality is news to me! –  Nov 22 '13 at 13:16
  • nice answer , but still unable to get the sharp edge between inline view and correlated query, when to consider a query as this and when as that ? – osama yaccoub Feb 25 '16 at 15:10
  • @osamayaccoub An [inline view](http://www.orafaq.com/wiki/Inline_view) is a subquery that is specifically located in the FROM statement of the outer query. A [correlated subquery](https://en.wikipedia.org/wiki/Correlated_subquery) on the other hand is a subquery which makes a reference to values of the outer query. Because of the nature of a correlated subquery, you can't have one of these in your FROM statement. –  Feb 25 '16 at 18:21
  • but in the example you mentioned (the code that wasn't working) ... the subquery wasn't in the FROM clause – osama yaccoub Feb 25 '16 at 18:35
  • 1
    @osamayaccoub I see your confusion now... yes, my subquery above had been on the `JOIN` and while that clause is an entity to itself for this case, think of it as a subset of the `FROM` clause. Focus on how the parser will dissect my code and create a plan for execution. Evaluating the `FROM` and `JOIN` clauses will be it's very first steps as it will need to know what resources to manage. But as it is trying to pull in it's initial resources and create a temp table/ view (behind the scenes) to house my subquery data, I am trying to refer to values that it has not yet evaluated. –  Feb 25 '16 at 18:52
  • 1
    @osamayaccoub Martin described it best above... "A derived table can't be correlated." I am trying to have the parser create a temp table for me as part of my FROM/JOIN clauses... so correlation is not allowable. –  Feb 25 '16 at 18:53
0

How about the following query:

SELECT t1.* FROM 
(
  SELECT * 
  FROM 
  (
    SELECT t2.id,
    RANK() OVER (PARTITION BY t2.id, t2.date ORDER BY t2.date DESC) AS R  
    FROM table2 t2
  )
  WHERE R = 1
) sub 
INNER JOIN table1 t1 
ON t1.id = sub.id
Linger
  • 14,942
  • 23
  • 52
  • 79
  • Thanks for pointing that out - I've edited my example above. I'm not actually pulling in t2.* in the 2nd query. That was a typo for this post –  Nov 20 '13 at 21:07
  • No, it gives the same compile error. It cannot recognize t1 if the Inner Join references a subquery without explicitely stating From table1. The specific error is ORA-00904: "t1"."id": invalid identifier –  Nov 20 '13 at 21:12
  • That's a nice try but I still get the same error. The issue is with the order that it's being evaluated... it's trying to evaluate that subquery before defining t1 which is why t1 is not recognized. –  Nov 20 '13 at 21:40
  • Hey! Nice job, it compiled (+1 for that!)! I'll have to play with this method to see if I can get it to do what I need... Unfortunately this solution is currently performing poorly (I never mentioned but table2 has 26+ million records in it so it's a beast to work with). I appreciate your help... you are definitely getting me thinking about different ways to approach the problem! Thank you! –  Nov 20 '13 at 22:05
-1

In your second example you are trying to pass the t1 reference down 2 levels.. you can't do that, you can only pass it down 1 level (which is why the 1st works). If you give a better example of what you are trying to do, we can help you rewrite your query as well.

Craig
  • 5,740
  • 21
  • 30
  • That's what I thought at first as well but if I replace the (SELECT max(date) FROM table2 WHERE id = t1.id)) with a single date, it still won't compile. The problem is doing an inner join directly on a subquery that references the outer table. But why? –  Nov 20 '13 at 21:00