The explanation behind the difference you observed is this:
Postgres has column statistics and can adapt the query plan depending on the value of a provided constant for datetime_threshold
. With favorable filter values, this can lead to a much more efficient query plan.
In the other case, when datetime_threshold
has to be computed in another SELECT
first, Postgres has to default to a generic plan. datetime_threshold
could be anything.
The difference will become obvious in EXPLAIN
output.
To make sure Postgres optimizes the second part for the actual datetime_threshold
value, you can either run two separate queries (feed the result of query 1 as constant to query 2), or use dynamic SQL to force re-planning of query 2 every time in a PL/pgSQL function.
For example
CREATE OR REPLACE FUNCTION foo(_user_id int, _distance int = 70)
RETURNS SETOF locations
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE
'SELECT *
FROM locations
WHERE user_id = $1
AND datetime > $2'
USING _user_id
, (SELECT max(datetime)
FROM locations
WHERE distance > _distance
AND user_id = _user_id);
END
$func$;
Call:
SELECT * FROM foo(9087);
Related:
In extreme cases, you might even use another dynamic query to calculate datetime_threshold
. But I don't expect that's necessary.
As for "something useful in the docs":
[...] The important difference is that EXECUTE
will re-plan the
command on each execution, generating a plan that is specific to the
current parameter values; whereas PL/pgSQL may otherwise create a
generic plan and cache it for re-use. In situations where the best
plan depends strongly on the parameter values, it can be helpful to
use EXECUTE
to positively ensure that a generic plan is not selected.
Bold emphasis mine.
Indexes
Perfect indexes would be:
CREATE INDEX ON locations (user_id, distance DESC NULL LAST, date_time DESC NULLS LAST); -- for query 1
CREATE INDEX ON locations (user_id, date_time); -- for query 2
Fine tuning depends on undisclosed details. Partial index might be an option.
There may be any number of additional reasons why your query is slow. Not enough details.