4

In PostgreSQL, I have an index on a date field on my tickets table. When I compare the field against now(), the query is pretty efficient:

# explain analyze select count(1) as count from tickets where updated_at > now();
                                                             QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=90.64..90.66 rows=1 width=0) (actual time=33.238..33.238 rows=1 loops=1)
   ->  Index Scan using tickets_updated_at_idx on tickets  (cost=0.01..90.27 rows=74 width=0) (actual time=0.016..29.318 rows=40250 loops=1)
         Index Cond: (updated_at > now())
Total runtime: 33.271 ms

It goes downhill and uses a Bitmap Heap Scan if I try to compare it against now() minus an interval.

# explain analyze select count(1) as count from tickets where updated_at > (now() - '24 hours'::interval);
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=180450.15..180450.17 rows=1 width=0) (actual time=543.898..543.898 rows=1 loops=1)
->  Bitmap Heap Scan on tickets  (cost=21296.43..175963.31 rows=897368 width=0) (actual time=251.700..457.916 rows=924373 loops=1)
     Recheck Cond: (updated_at > (now() - '24:00:00'::interval))
     ->  Bitmap Index Scan on tickets_updated_at_idx  (cost=0.00..20847.74 rows=897368 width=0)     (actual time=238.799..238.799 rows=924699 loops=1)
           Index Cond: (updated_at > (now() - '24:00:00'::interval))
Total runtime: 543.952 ms

Is there a more efficient way to query using date arithmetic?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Don Pflaster
  • 1,000
  • 7
  • 22
  • Version? `select version()`. Post the table schema `\d tickets` – Clodoaldo Neto Jul 15 '14 at 13:24
  • Have you `ANALYZE` the table? BTW The first query uses index only scan while the second processes much more records and has to scan the table itself. – Ihor Romanchenko Jul 15 '14 at 13:40
  • @IgorRomanchenko: I see an index scan in the first query, not an index only scan. – Erwin Brandstetter Jul 15 '14 at 14:00
  • 3
    40250 rows or 924699 rows, it makes a difference... – Frank Heikens Jul 15 '14 at 14:06
  • Version is 9.1 (I know, we're working on upgrading as we speak :) ) – Don Pflaster Jul 15 '14 at 14:39
  • The basic thrust of the question, though, is - why does the addition of the interval make the comparison use a different strategy? Shouldn't it just figure out NOW() - '24 hours'::interval and then try to compare that to the index? Can I get around it by casting the where condition to something else? – Don Pflaster Jul 15 '14 at 14:45
  • 1
    It's not the addition per se. Replace `now()` and `now() - '24 hours'::interval` with respective timestamp literals and you get the same result. It's the *expected* number of rows to be found (74 vs 897368) that matters. As explained in my answer. – Erwin Brandstetter Jul 15 '14 at 14:51
  • Sorry, I get it now. updated_at > now() returns less rows. Rrg. Thanks, all. – Don Pflaster Jul 15 '14 at 14:51

1 Answers1

5

The 1st query expects to find rows=74, but actually finds rows=40250.
The 2nd query expects to find rows=897368 and actually finds rows=924699.

Of course, processing 23 x as many rows takes considerably more time. So your actual times are not surprising.

Statistics for data with updated_at > now() are outdated. Run:

ANALYZE tickets;

and repeat your queries. And you seriously have data with updated_at > now()? That sounds wrong.

It's not surprising, however, that statistics are outdated for data most recently changed. That's in the logic of things. If your query depends on current statistics, you have to run ANALYZE before you run your query.

Also test with (in your session only):

SET enable_bitmapscan = off;

and repeat your second query to see times without bitmap index scan.

Why bitmap index scan for more rows?

A plain index scan fetches rows from the heap sequentially as found in the index. That's simple, dumb and without overhead. Fast for few rows, but may end up more expensive than a bitmap index scan with a growing number of rows.

A bitmap index scan collects rows from the index before looking up the table. If multiple rows reside on the same data page, that saves repeated visits and can make things considerably faster. The more rows, the greater the chance, a bitmap index scan will save time.

For even more rows (around 5% of the table, heavily depends on actual data), the planner switches to a sequential scan of the table and doesn't use the index at all.

The optimum would be an index only scan, introduced with Postgres 9.2. That's only possible if some preconditions are met. If all relevant columns are included in the index, the index type support it and the visibility map indicates that all rows on a data page are visible to all transactions, that page doesn't have to be fetched from the heap (the table) and the information in the index is enough.

The decision depends on your statistics (how many rows Postgres expects to find and their distribution) and on cost settings, most importantly random_page_cost, cpu_index_tuple_cost and effective_cache_size.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • So it sounds like I should just upgrade to 9.3 and see if the issue persists. I should upgrade anyway. – Don Pflaster Jul 15 '14 at 14:43
  • Oh, I feel silly. I understand the updated_at > NOW() - I wasn't even thinking about that when I was playing with the queries. Of course there are fewer rows. – Don Pflaster Jul 15 '14 at 14:50