Nice question, I have came around this problem a while ago.
Why this happens?
You should look into the number of user='abcd'
values in your stats like this:
SELECT attname, null_frac, ag_width, n_distinct,
most_common_vals, most_common_freqs, histogram_bounds
FROM pg_stats
WHERE table_name='T';
My guess is — this value occurs quite often and you'll find it in the most_common_vals
output.
Picking the same element from the most_common_freqs
you'll get the ratio for the value, multiply it by total number of rows (can be obtained from pg_class
) to get number of rows that are estimated to have 'abcd'
value.
Planner assumes all values to have a linear distribution. In reality things are different of course.
Also, currently there're no correlated stats (although some work is being done in this direction).
So, let's take user='abcd'
value, having 0.001
ratio (per question) in the corresponding most_common_freqs
entry. This means value will occur every 1000 rows (assuming linear distribution). It appears, that if we'll scan table in any way we'll hit our user='abcd'
in some 1000 rows. Sounds that it should be fast! Planner "thinks" the same and chooses index on timestamp
column.
But it is not. If we'll assume, that your table T
contains logs of user activity, and user='abcd'
was on vacation for the last 3 weeks, then this means we'll have to read quite a lot of rows from the timestamp
index (3 weeks worth of data) before we actually hit the row we want. Well, you as DBA know this, but planner assumes linear distribution.
So, how to fix?
You'll have to trick the planner to use what you need, as you have more knowledge of your data.
Use OFFSET 0
trick with subquery:
SELECT *
FROM
(
SELECT * FROM T WHERE user='abcd' OFFSET 0
)
ORDER BY timestamp
LIMIT 1;
This trick protects query from inlining, therefore inner part is executed on it's own.
Use CTE
(named subquery):
WITH s AS (
SELECT * FROM T WHERE user='abcd'
)
SELECT *
FROM s
ORDER BY timestamp
LIMIT 1;
Per documentation:
A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries.
Use count(*)
for aggrgated queries:
SELECT min(session_id), count(*) -- instead of simply `min(session_id)`
FROM T
WHERE user='abcd'
ORDER BY timestamp
LIMIT 1;
This is not really applicable, but I wanted to mention it.
And please, consider upgrading to 9.3.
P.S. More on row estiamtes in the docs of course.