If you intend to use CLUSTER
, the displayed syntax is invalid.
create CLUSTER ticket USING ticket_1_idx;
Run once:
CLUSTER ticket USING ticket_1_idx;
This can help a lot with bigger result sets. Less for a single or few rows returned.
If your table isn't read-only the effect deteriorates over time. Re-run CLUSTER
at reasonable intervals. Postgres remembers the index for subsequent calls, so this works, too:
CLUSTER ticket;
(But I would rather be explicit and use the first form.)
However, if you have lots of updates, CLUSTER
(or VACUUM FULL
) may actually be bad for performance. The right amount of bloat allows UPDATE
to place new row versions on the same data page and avoids the need for extending the underlying physical file (expensively) too often. You can use a carefully tuned FILLFACTOR
to get the best of both worlds:
CLUSTER
takes an exclusive lock on the table, which may be a problem in a multi-user environment. Quoting the manual:
When a table is being clustered, an ACCESS EXCLUSIVE
lock is acquired
on it. This prevents any other database operations (both reads and
writes) from operating on the table until the CLUSTER
is finished.
Bold emphasis mine. Consider the alternatives!
pg_repack
:
Unlike CLUSTER
and VACUUM FULL
it works online, without holding an
exclusive lock on the processed tables during processing. pg_repack is
efficient to boot, with performance comparable to using CLUSTER
directly.
and:
pg_repack needs to take an exclusive lock at the end of the reorganization.
The current version 1.4.7 works with PostgreSQL 9.4 - 14.
pg_squeeze
is a newer alternative that claims:
In fact we try to replace pg_repack
extension.
The current version 1.4 works with Postgres 10 - 14.
Query
The query is simple enough not to cause any performance problems per se.
However: The BETWEEN
construct includes boundaries. Your query selects all of Dec. 19, plus records from Dec. 20, 00:00. That's an extremely unlikely requirement. Chances are, you really want:
SELECT *
FROM ticket
WHERE created >= '2012-12-19 00:00'
AND created < '2012-12-20 00:00';
Performance
Why is it selecting sequential scan?
Your EXPLAIN
output clearly shows an Index Scan, not a sequential table scan. There must be some kind of misunderstanding.
You may be able to improve performance, but the necessary background information is not in the question. Possible options include:
Only query required columns instead of *
to reduce transfer cost (and other performance benefits).
Look at partitioning and put practical time slices into separate tables. Add indexes to partitions as needed.
If partitioning is not an option, another related but less intrusive technique would be to add one or more partial indexes.
For example, if you mostly query the current month, you could create the following partial index:
CREATE INDEX ticket_created_idx ON ticket(created)
WHERE created >= '2012-12-01 00:00:00'::timestamp;
CREATE
a new index right before the start of a new month. You can easily automate the task with a cron job.
Optionally DROP
partial indexes for old months later.
Keep the total index in addition for CLUSTER
(which cannot operate on partial indexes). If old records never change, table partitioning would help this task a lot, since you only need to re-cluster newer partitions.
Then again if records never change at all, you probably don't need CLUSTER
.
Performance Basics
You may be missing one of the basics. All the usual performance advice applies: