2

Lets say I have a function, POINT_IN_SHAPE('shape_name', x, y) that is an expensive-ish call, so I would want to reduce the amount of times it would be called. One way to prevent calling the function is to draw a bounding box around the shape. For example, if a point's x is greater than the maximum x value in the shape, then it's not inside.

if(x > max_X) return outside;

With this in mind, would adding that condition to my WHERE clause speed up the query by reducing the amount of rows that POINT_IN_SHAPE() needs to run on? Let's say there in an index on x, and POINT_IN_SHAPE() does not do a bounding box check.

select * from points where x <= max_X and POINT_IN_SHAPE(...);

vs

select * from points where POINT_IN_SHAPE(...);

I'm hoping the conditions reduce the set of rows for other conditions, but I'm not sure how SQL works. I'm also hoping that it'll be smart enough to reduce rows first using the cheaper condition.

I don't really care if the ordering of the conditions matters, I'm just wondering if one condition would reduce the "search space" for another condition. From the comments, answers, links to other threads, and a bit of testing, it looks like the conditions do reduce the search space for other conditions in the where clause.

Kara
  • 6,115
  • 16
  • 50
  • 57
GuitarStrum
  • 713
  • 8
  • 24
  • In theory, it depends on how (in which order) the query planner decides to evaluate this, but in practice I would assume that your approach work, especially if there is an index on `x`. – Thilo Jan 25 '17 at 12:59
  • Possible duplicate of [Execution order of conditions in SQL 'where' clause](http://stackoverflow.com/questions/340139/execution-order-of-conditions-in-sql-where-clause) – Thilo Jan 25 '17 at 13:05
  • https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2153739600346754734 – Thilo Jan 25 '17 at 13:08
  • Regarding the EDIT: Of course you do care. The inequality does reduce the search space either way. What you want is for it to reduce the search space BEFORE the functional condition is evaluated. So you do care about the order in which the conditions are applied. –  Jan 25 '17 at 13:54
  • I know I care, I mean for the sake of the question I just want to know if a condition does or does not reduce the search space for another condition. regardless of execution order. Which, as I stated in the edit, looks like is the case based on everything that has been said thus far. That said, yes, execution order is important because of reducing the search space. – GuitarStrum Jan 25 '17 at 14:17

3 Answers3

2

In your example, it is almost certain that the optimizer will use the condition x <= max_x first. Why? Because some conditions are "access predicates" (conditions the engine uses to determine which rows will be looked at in the first place), while other conditions are "filter predicates" (conditions used simply to decide which rows should be selected and which should be discarded). "Access predicates" are evaluated before "filter predicates" (obviously, if you think about what that means).

In your case the optimizer will decide to use x <= max_x as access predicate, especially with an index on the x column. So it will use that condition first.

Gordon shows the standard way to "force" the order of evaluation of conditions separated by AND. (It would work the same for conditions separated by OR.) Do remember this trick, it's quite useful - even though perhaps in this situation it is not needed. It might be needed if your query was more complex; for example, if in addition to x <= max_x you also had a condition like state = 'MN'. Then the optimizer may choose to use state = 'MN' as access predicate, and both x <= max_x and your functional condition as filter predicates. It is still likely that it will evaluate the inequality first, but you could use Gordon's trick just to make sure.

  • Interesting. I did a little digging and found that x and y are in fact being used as "access predicates," while the function is being used as a filter predicate. I'll have to read into those. Thanks. – GuitarStrum Jan 25 '17 at 14:16
1

Oracle does not guarantee the ordering of conditional expressions (no database does). So, mere ordering is not sufficient. The same might be true of using subqueries -- Oracle reserves the right to rearrange expressions if it thinks doing so makes things more efficient. The problem is that the optimizer could be wrong.

This is one of the situations where a case in the where clause can be appropriate.

where (case when x >= max_x then 'outside'
            when POINT_IN_SHAPE(...) then 'inside'
       end) = 'inside'

The conditions in the case are guaranteed to be evaluated sequentially. Note: this just uses strings 'inside' and 'outside' for clarity.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • But with an index on `x` (the use of which you could force with a hint), it seems pretty much guaranteed in practice that the first clause gets evaluated first (and even pushed into the access path). – Thilo Jan 25 '17 at 13:01
  • @Thilo . . . You are correct. Hints are an alternative method for ensuring the ordering. – Gordon Linoff Jan 25 '17 at 13:06
  • In practice, I'd first see if the Cost-based-optimizer (which does try to take the cost of calling functions into account) gets it right. This case seems pretty straighforward, so my guess is you'll be fine. If not (and for more complex queries the optimizer does get it wrong, occasionally catastrophically so), then you have to force it with tricks like hints or this case pattern above (nice one!), and this gets ugly. There is certainly no simple, explicit API to coerce execution order. – Thilo Jan 25 '17 at 13:11
  • @Thilo . . . That can be dangerous for an application. The optimizer might make the right decision today, but changes to the data might cause it to change its mind. (I will admit that I've had this problem more with databases other than Oracle.) – Gordon Linoff Jan 25 '17 at 13:14
  • Yes, that is an issue. And there is no good solution within the "normal" API or query language. It gets very arcane very soon (does not mean you don't have to resort to these things, but it should be that: a last resort). In the case at hand, it seems pretty hard for the optimizer to get wrong. – Thilo Jan 25 '17 at 13:19
0

It depends on how your DB engine will execute this query, what execution plan engine chooses.

Other way to write query is to first filter desired record based on condition in a sub query and after that apply function call on filtered output.

In your case you can try this.

 select POINT_IN_SHAPE('shape_name', x, y) 
    from (select * from points where x <= max_X) data
MANISH KUMAR CHOUDHARY
  • 3,396
  • 3
  • 22
  • 34
Rakesh
  • 84
  • 7
  • The way the sub-query gets executed in also completely up the optimizer. So without forcing its hand (via hints) this should not make any difference. – Thilo Jan 25 '17 at 13:13