7

The following query returns the venues near us (lat: 62.0, lon: 25.0) inside whose radius we fall in ordered by distance:

SELECT *, 
     earth_distance(ll_to_earth(62.0, 25.0), 
     ll_to_earth(lat, lon)) AS distance 
FROM venues 
WHERE earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon)) <= radius 
ORDER BY earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon))

Is it possible (and advisable) to re-use the result from earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon)) instead of computing it separately for SELECT, WHERE and ORDER BY clauses?

Old Pro
  • 24,624
  • 7
  • 58
  • 106
randomguy
  • 12,042
  • 16
  • 71
  • 101
  • 3
    I think if the function is marked as [immutable](http://www.postgresql.org/docs/9.2/static/xfunc-volatility.html), the result will be re-used. Hopefully a Postgres expert can correct me if I'm wrong. – Mike Christensen Dec 28 '12 at 19:19
  • 2
    @MikeChristensen: Yes, that's how it usually works. Even `STABLE` is enough, because that declares the result constant within a single statement. `IMMUTABLE` is required to assert constant results even *between* transactions. That's needed for a function to be usable in an index for instance. – Erwin Brandstetter Dec 29 '12 at 11:11

3 Answers3

7

In the GROUP BY and ORDER BY clause you can refer to column aliases (output columns) or even ordinal numbers of SELECT list items. I quote the manual on ORDER BY:

Each expression can be the name or ordinal number of an output column (SELECT list item), or it can be an arbitrary expression formed from input-column values.

Bold emphasis mine.

But in the WHERE and HAVING clauses, you can only refer to columns from the base tables (input columns), so you have to spell out your function call.

SELECT *, earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon)) AS dist
FROM   venues 
WHERE  earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon)) <= radius 
ORDER  BY distance;

If you want to know if it's faster to pack the calculation into a CTE or subquery, just test it with EXPLAIN ANALYZE. (I doubt it.)

SELECT *
FROM  (
   SELECT *
         ,earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon)) AS dist
   FROM   venues
   ) x
WHERE  distance <= radius 
ORDER  BY distance;

Like @Mike commented, by declaring a function STABLE (or IMMUTABLE) you inform the query planner that results from a function call can be reused multiple times for identical calls within a single statement. I quote the manual here:

A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call.

Bold emphasis mine.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Great answer. Just to be clear: You doubt that subquery will be faster than CTE or doubt that it's worth to use CTE/subquery over just computing it twice? – randomguy Dec 28 '12 at 22:26
  • 1
    @randomguy: I expect the plain form to be faster than with either subquery or CTE. But just test and see. Testing >> guessing. – Erwin Brandstetter Dec 28 '12 at 22:33
  • 1
    @ErwinBrandstetter : in my experience CTE's always are kept intact, even when referenced only once (which IMO should be equivalent to the equivalent VIEW or `FROM (subquery)` form). As a result, they will not be disassembled + reassembled by the plan generator, which *may* lead to sub-optimal plans. At the plus-side: this will keep the rangetables relatively small (and rather unrelated), which will reduce the combinatorial explosion in the number of possible plans. (I have not tried 9.2 yet) – wildplasser Dec 29 '12 at 16:11
  • @wildplasser: If your observation holds, usefulness of CTE's grows for very big and complex queries, where planning overhead starts to eat performance. You'd put subqueries that won't be further optimized into CTEs to reduce complexity of the main query. I'll have to test this when I find time. – Erwin Brandstetter Dec 29 '12 at 17:31
3

While I use MS SQL Server primarily, I'm pretty sure PostgreSQL supports CTEs. Try something like:

WITH CTE_venues AS (
SELECT *, earth_distance(ll_to_earth(62.0, 25.0), ll_to_earth(lat, lon)) AS distance 
FROM venues 
)
SELECT *
FROM CTE_venues 
WHERE distance <= radius 
ORDER BY distance
sgeddes
  • 62,311
  • 6
  • 61
  • 83
0

You can also create stand alone or packaged function and use it in your queries:

 SELECT *
   FROM ...
  WHERE distance <= your_function()  -- OR your_package_name.your_function()
 ORDER BY ...

You can use your function in select:

Select your_function() 
  From your_table...
 Where  ...
Art
  • 5,616
  • 1
  • 20
  • 22
  • 2
    Completely useless answer. 1) The calculated distance still needs to be in the select clause, ie `select *` won't work. 2) the result set still must be ordered by the function result. 3) This equates to a query absolutely no different from what was originally posted. The question is about function volatility, and whether the query optimizer evaluates the function three times or one. – Mike Christensen Dec 28 '12 at 21:02
  • 1
    Just trying to help... Please look at my example again - functions can be used in select... Just FYI... And the question was about reusing the results. Function returns a result and can be reused... – Art Dec 28 '12 at 21:06