On postgresql 9.3, I have a table with a little over a million records, the table was created as:
CREATE TABLE entradas
(
id serial NOT NULL,
uname text,
contenido text,
fecha date,
hora time without time zone,
fecha_hora timestamp with time zone,
geom geometry(Point,4326),
CONSTRAINT entradas_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE entradas
OWNER TO postgres;
CREATE INDEX entradas_date_idx
ON entradas
USING btree
(fecha_hora);
CREATE INDEX entradas_gix
ON entradas
USING gist
(geom);
I'm executing a query to aggregate rows on time intervals as follows:
WITH x AS (
SELECT t1, t1 + interval '15min' AS t2
FROM generate_series('2014-12-02 0:0' ::timestamp
,'2014-12-02 23:45' ::timestamp, '15min') AS t1
)
select distinct
x.t1,
count(t.id) over w
from x
left join entradas t on t.fecha_hora >= x.t1
AND t.fecha_hora < x.t2
window w as (partition by x.t1)
order by x.t1
This query takes about 50 seconds. From the output of explain, you can see that the timestamp index is not used:
Unique (cost=86569161.81..87553155.15 rows=131199111 width=12)
CTE x
-> Function Scan on generate_series t1 (cost=0.00..12.50 rows=1000 width=8)
-> Sort (cost=86569149.31..86897147.09 rows=131199111 width=12)
Sort Key: x.t1, (count(t.id) OVER (?))
-> WindowAgg (cost=55371945.38..57667929.83 rows=131199111 width=12)
-> Sort (cost=55371945.38..55699943.16 rows=131199111 width=12)
Sort Key: x.t1
-> Nested Loop Left Join (cost=0.00..26470725.90 rows=131199111 width=12)
Join Filter: ((t.fecha_hora >= x.t1) AND (t.fecha_hora < x.t2))
-> CTE Scan on x (cost=0.00..20.00 rows=1000 width=16)
-> Materialize (cost=0.00..49563.88 rows=1180792 width=12)
-> Seq Scan on entradas t (cost=0.00..37893.92 rows=1180792 width=12)
However, if i do set enable_seqscan=false
(I know, one should never do this), then the query executes in less than a second and the output of explain shows that it is using the index on the timestamp column:
Unique (cost=91449584.16..92433577.50 rows=131199111 width=12)
CTE x
-> Function Scan on generate_series t1 (cost=0.00..12.50 rows=1000 width=8)
-> Sort (cost=91449571.66..91777569.44 rows=131199111 width=12)
Sort Key: x.t1, (count(t.id) OVER (?))
-> WindowAgg (cost=60252367.73..62548352.18 rows=131199111 width=12)
-> Sort (cost=60252367.73..60580365.51 rows=131199111 width=12)
Sort Key: x.t1
-> Nested Loop Left Join (cost=1985.15..31351148.25 rows=131199111 width=12)
-> CTE Scan on x (cost=0.00..20.00 rows=1000 width=16)
-> Bitmap Heap Scan on entradas t (cost=1985.15..30039.14 rows=131199 width=12)
Recheck Cond: ((fecha_hora >= x.t1) AND (fecha_hora < x.t2))
-> Bitmap Index Scan on entradas_date_idx (cost=0.00..1952.35 rows=131199 width=0)
Index Cond: ((fecha_hora >= x.t1) AND (fecha_hora < x.t2))
Why is postgres not using entradas_date_idx
unless I force it to even if executing the query is way faster using it?
How could I make postgres use entradas_date_idx
without resorting to set enable_seqscan=false
?