I've got a table pings
with about 15 million rows in it. I'm on postgres 9.2.4. The relevant columns it has are a foreign key monitor_id
, a created_at
timestamp, and a response_time
that's an integer that represents milliseconds. Here is the exact structure:
Column | Type | Modifiers
-----------------+-----------------------------+----------------------------------------------------
id | integer | not null default nextval('pings_id_seq'::regclass)
url | character varying(255) |
monitor_id | integer |
response_status | integer |
response_time | integer |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
response_body | text |
Indexes:
"pings_pkey" PRIMARY KEY, btree (id)
"index_pings_on_created_at_and_monitor_id" btree (created_at DESC, monitor_id)
"index_pings_on_monitor_id" btree (monitor_id)
I want to query for all the response times that are not NULL
(90% won't be NULL
, about 10% will be NULL
), that have a specific monitor_id
, and that were created in the last month. I'm doing the query with ActiveRecord, but the end result looks something like this:
SELECT "pings"."response_time"
FROM "pings"
WHERE "pings"."monitor_id" = 3
AND (created_at > '2014-03-03 20:23:07.254281'
AND response_time IS NOT NULL)
It's a pretty basic query, but it takes about 2000ms to run, which seems rather slow. I'm assuming an index would make it faster, but all the indexes I've tried aren't working, which I'm assuming means I'm not indexing properly.
When I run EXPLAIN ANALYZE
, this is what I get:
Bitmap Heap Scan on pings (cost=6643.25..183652.31 rows=83343 width=4) (actual time=58.997..1736.179 rows=42063 loops=1) Recheck Cond: (monitor_id = 3) Rows Removed by Index Recheck: 11643313 Filter: ((response_time IS NOT NULL) AND (created_at > '2014-03-03 20:23:07.254281'::timestamp without time zone)) Rows Removed by Filter: 324834 -> Bitmap Index Scan on index_pings_on_monitor_id (cost=0.00..6622.41 rows=358471 width=0) (actual time=57.935..57.935 rows=366897 loops=1) Index Cond: (monitor_id = 3)
So there is an index on monitor_id
that is being used towards the end, but nothing else. I've tried various permutations and orders of compound indexes using monitor_id
, created_at
, and response_time
. I've tried ordering the index by created_at
in descending order. I've tried a partial index with response_time IS NOT NULL
.
Nothing I've tried makes the query any faster. How would you optimize and/or index it?