1

I tried adding an index on a foreign key column in a table containing tens of millions of rows and a semi-complex query I was running went from 4 seconds to 10 ms execution time. Nice!

I then tried to delete it again to do some more testing, but when I executed the same query it still took 10 ms (compared to the 4 seconds it took before adding the index).

Does dropping an index reset the performance gain? If not, how can I completely delete the index?

The query looks like this (this query runs once per minute and stores the result in another table) :

SELECT COUNT(*) AS count
FROM (
   SELECT MAX(ze.timestamp) AS time, r.device_id
   FROM loc_zone_events ze
   INNER JOIN loc_zones z ON ze.zone_id = z.id
   INNER JOIN raw_events r ON ze.raw_event_id = r.id
   WHERE z.app_id = 1
   AND ROUND(EXTRACT('epoch' FROM NOW() - ze.timestamp) / 60) BETWEEN 0 AND 10
   GROUP BY r.device_id
   ORDER BY time DESC
   ) AS t
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Samuel Bolduc
  • 18,163
  • 7
  • 34
  • 55

2 Answers2

4

DROP INDEX completely drops the index.

The transaction has to be committed before it takes effect for new queries, but that's not normally an issue. You are probably seeing other testing artifacts like:

  • Postgres flips to a different query plan after statistics have slightly changed. That would indicate that your cost settings might be inappropriate or some other poor configuration.
  • Repeated execution of the query has populated the cache (which can make a big difference for big tables). For halfway comparable results run all candidates a couple of times.
  • Your query is based on "the last ten minutes". There can be 1000 rows and 10 minutes later, there can be just 1. Could make a big difference.

Query

For starters, remove completely unnecessary parts:

SELECT COUNT(*) AS count
FROM (
   SELECT 1
   FROM   loc_zones       z
   JOIN   loc_zone_events ze ON ze.zone_id = z.id
   JOIN   raw_events      r  ON r.id = ze.raw_event_id
   WHERE  z.app_id = 1
   AND    round(EXTRACT('epoch' FROM NOW() - ze.timestamp) / 60) BETWEEN 0 AND 10
   GROUP  BY r.device_id
   ) AS t;

Or:

SELECT COUNT(DISTINCT r.device_id) AS count
FROM   loc_zones       z
JOIN   loc_zone_events ze ON ze.zone_id = z.id
JOIN   raw_events      r  ON r.id = ze.raw_event_id
WHERE  z.app_id = 1
AND    round(EXTRACT('epoch' FROM NOW() - ze.timestamp) / 60) BETWEEN 0 AND 10

(Not necessarily faster, count(DISTINCT col) is no performance hero.)

But there is more:

Your WHERE condition round(...) is not sargable. To retrieve events of "the last 10 minutes", use instead:

...
AND    ze.timestamp >= now() - interval '10 min'
AND    ze.timestamp <  now();  -- only if there can be timestamps in the future 

This is sargable and can use an index on ze.timestamp.

Note: Your expression was using round() instead of trunc(), which effectively covers the range (-0.5, 10.5), which are 11 minutes (not 10), or 10.5 minutes if there are no future timestamps. Deal with that difference one way or the other ...

Index

Since only the last 10 minutes seem to be relevant, you could further improve that with a partial index. The special difficulty here is the moving time frame. This related answer has a complete solution for that:

Building on that, you then have a partial index like:

CREATE INDEX ze_timestamp_recent_idx ON tbl (timestamp DESC);
WHERE  created_at > f_min_ts();

And adapt the query like:

WHERE  ...
AND    ze.timestamp > f_min_ts()   -- to match partial index
AND    ze.timestamp >= now() - interval '10 min'
AND    ze.timestamp <  now();

Aside: don't use the basic type name timestamp as column name.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you for your answer! The `WHERE` condition `round(...)` is because I must select only the events in the last 10 minutes. Maybe it is overly complex for what I need though – Samuel Bolduc Aug 14 '14 at 18:47
  • `WHERE ze.timestamp >= now() - '10 min'::interval` – wildplasser Aug 14 '14 at 18:50
  • @wildplasser I'm not sure it works as intended, because when I try the way you suggested it takes 2x as long as with `round(...)` and returns me a number much higher than it should – Samuel Bolduc Aug 14 '14 at 18:53
  • @SamuelBolduc: do you have future timestamps in the table? Also, there is a side effect from `round()`: Your query effectively collects rows for 10,499999 minutes (11 minutes if there are future timestamps) - probably unintended. – Erwin Brandstetter Aug 14 '14 at 19:10
  • 1
    Sargable, where the hell did that word come from. +1, an education as always. – John Powell Aug 14 '14 at 19:30
  • 1
    @JohnBarça: From: **S** earch **ARG** ument **ABLE**. I added a link to [Wikipedia](http://en.wikipedia.org/wiki/Sargable). – Erwin Brandstetter Aug 14 '14 at 19:35
  • @ErwinBrandstetter. Yes, I read the link, sorry, my English isn't very good tonight. I meant more why haven't I heard that term before. It is a good one, though an odd sounding word. – John Powell Aug 14 '14 at 19:45
  • @JohnBarça: It's an odd word for me, too. But useful to cut the explanation short. – Erwin Brandstetter Aug 14 '14 at 19:49
  • For the different number, it was a mix of erroneous query on my end and the `round(...)`. Thank you for the very good answer, covered much more than the initial question but I learned many things, thanks! – Samuel Bolduc Aug 14 '14 at 20:06
  • I did not know the word `sargable` either. Looks like the DBMS equivalent of *(loop invariant) constant hoisting* in compilers. BTW, I seem to remember that `WHERE COALESCE(zdate,'1900-01-01') == 1900-01-01` is rewritten to the `... OR ... IS NULL` - form. (which would mean that coalesce() is not a true/pure function) – wildplasser Aug 16 '14 at 11:51
0

It depends on which database you are using.

1.) If you have a large database (as you describe) .. most likely you would partition it. and create index on partition.

2.) if you are creating index on large table while running a query that is going to slow it down too. b/c now two of your process are using database resources.

3.) while you were running your query did any other process also inserting/updating/deleting rows? do you have enough temporary space ? is your query doing sorting/grouping operations?

these all will matter and also what will matter more is architecture of your database.. in my view.

cheers!

Sendi_t
  • 617
  • 5
  • 13