Slow PostgreSQL query in production - help me understand this explain analyze output

Question

I have a query that is taking 9 minutes to run on PostgreSQL 9.0.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

This query is automatically generated by hibernate for my application. It's trying to find all of the "teacher members" in a school. A membership is a user with a role in a group. There are several types of groups, but here what matters are schools and services. If someone is a teacher member in a service and a member in this school (15499) then they are what we are looking for.

This query used to run fine in production and still runs fine in development, but in production it is now taking several minutes to run. Can you help me understand why?

Here's the query:

select distinct user1_.ID as ID14_, user1_.FIRST_NAME as FIRST2_14_, user1_.LAST_NAME as LAST3_14_, user1_.STREET_1 as STREET4_14_, user1_.STREET_2 as STREET5_14_, user1_.CITY as CITY14_, user1_.us_state_id as us7_14_, user1_.REGION as REGION14_, user1_.country_id as country9_14_, user1_.postal_code as postal10_14_, user1_.USER_NAME as USER11_14_, user1_.PASSWORD as PASSWORD14_, user1_.PROFESSION as PROFESSION14_, user1_.PHONE as PHONE14_, user1_.URL as URL14_, user1_.bio as bio14_, user1_.LAST_LOGIN as LAST17_14_, user1_.STATUS as STATUS14_, user1_.birthdate as birthdate14_, user1_.ageInYears as ageInYears14_, user1_.deleted as deleted14_, user1_.CREATEDATE as CREATEDATE14_, user1_.audit as audit14_, user1_.migrated2008 as migrated24_14_, user1_.creator as creator14_ 
from DIR_MEMBERSHIPS membership0_ 
inner join DIR_USERS user1_ on membership0_.USER_ID=user1_.ID, DIR_ROLES role2_, DIR_GROUPS group4_ 
where membership0_.role=role2_.ID 
and membership0_.GROUP_ID=group4_.id 
and membership0_.GROUP_ID=15499 
and case when membership0_.expires is null 
    then 1 
    else case when (membership0_.expires > CURRENT_TIMESTAMP and (membership0_.startDate is null or membership0_.startDate < CURRENT_TIMESTAMP)) 
        then 1 
        else 0 end 
    end =1 
and membership0_.deleted=false 
and role2_.deleted=false 
and role2_.NAME='ROLE_MEMBER' 
and group4_.deleted=false 
and user1_.STATUS='active' 
and user1_.deleted=false 
and (membership0_.USER_ID in (
    select membership7_.USER_ID 
    from DIR_MEMBERSHIPS membership7_, DIR_USERS user8_, DIR_ROLES role9_ 
    where membership7_.USER_ID=user8_.ID 
    and membership7_.role=role9_.ID 
    and case when membership7_.expires is null 
        then 1 
        else case when (membership7_.expires > CURRENT_TIMESTAMP 
                        and (membership7_.startDate is null or membership7_.startDate < CURRENT_TIMESTAMP)) 
            then 1 
            else 0 end 
        end =1 
    and membership7_.deleted=false 
    and role9_.NAME='ROLE_TEACHER_MEMBER'));

Explain analyze output:

 HashAggregate  (cost=61755.63..61755.64 rows=1 width=3334) (actual time=652504.302..652504.307 rows=4 loops=1)
   ->  Nested Loop  (cost=4355.35..61755.56 rows=1 width=3334) (actual time=304.450..652504.217 rows=6 loops=1)
     ->  Nested Loop  (cost=4355.35..61747.28 rows=1 width=3342) (actual time=304.419..652504.060 rows=6 loops=1)
           ->  Nested Loop Semi Join  (cost=4355.35..61738.97 rows=1 width=32) (actual time=304.385..652503.961 rows=6 loops=1)
                 Join Filter: (user_id = user_id)
                 ->  Nested Loop  (cost=0.00..32.75 rows=1 width=16) (actual time=0.190..26.703 rows=758 loops=1)
                       ->  Seq Scan on dir_roles role2_  (cost=0.00..1.25 rows=1 width=8) (actual time=0.032..0.038 rows=1 loops=1)
                             Filter: ((NOT deleted) AND ((name)::text = 'ROLE_MEMBER'::text))
                       ->  Index Scan using dir_memberships_role_group_id_index on dir_memberships membership0_  (cost=0.00..31.49 rows=1 width=24) (actual time=0.151..25.626 rows=758 loops=1)
                             Index Cond: ((role = role2_.id) AND (group_id = 15499))
                             Filter: ((NOT deleted) AND (CASE WHEN (expires IS NULL) THEN 1 ELSE CASE WHEN ((expires > now()) AND ((startdate IS NULL) OR (startdate < now()))) THEN 1 ELSE 0 END END = 1))
                 ->  Nested Loop  (cost=4355.35..61692.86 rows=1069 width=16) (actual time=91.088..843.967 rows=79986 loops=758)
                       ->  Nested Loop  (cost=4355.35..54185.33 rows=1069 width=8) (actual time=91.065..555.830 rows=79986 loops=758)
                             ->  Seq Scan on dir_roles role9_  (cost=0.00..1.25 rows=1 width=8) (actual time=0.006..0.013 rows=1 loops=758)
                                   Filter: ((name)::text = 'ROLE_TEACHER_MEMBER'::text)
                             ->  Bitmap Heap Scan on dir_memberships membership7_  (cost=4355.35..53983.63 rows=16036 width=16) (actual time=91.047..534.236 rows=79986 loops=758)
                                   Recheck Cond: (role = role9_.id)
                                   Filter: ((NOT deleted) AND (CASE WHEN (expires IS NULL) THEN 1 ELSE CASE WHEN ((expires > now()) AND ((startdate IS NULL) OR (startdate < now()))) THEN 1 ELSE 0 END END = 1))
                                   ->  Bitmap Index Scan on dir_memberships_role_index  (cost=0.00..4355.09 rows=214190 width=0) (actual time=87.050..87.050 rows=375858 loops=758)
                                         Index Cond: (role = role9_.id)
                       ->  Index Scan using dir_users_pkey on dir_users user8_  (cost=0.00..7.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=60629638)
                             Index Cond: (id = user_id)
           ->  Index Scan using dir_users_pkey on dir_users user1_  (cost=0.00..8.29 rows=1 width=3334) (actual time=0.011..0.011 rows=1 loops=6)
                 Index Cond: (id = user_id)
                 Filter: ((NOT deleted) AND ((status)::text = 'active'::text))
     ->  Index Scan using dir_groups_pkey on dir_groups group4_  (cost=0.00..8.28 rows=1 width=8) (actual time=0.023..0.023 rows=1 loops=6)
           Index Cond: (group4_.id = 15499)
           Filter: (NOT group4_.deleted)
Total runtime: 652504.827 ms
(29 rows)

I am reading and reading forum posts and the user manual, but I can't figure out what would make this run faster, except maybe if it were possible to make indexes for the select that uses the now() function.

You may be interested in this [related question on dba.SE](http://dba.stackexchange.com/q/18664/3684) concerning auto-vacuum. — Erwin Brandstetter, Jul 25 '12 at 16:30

Erwin Brandstetter · Accepted Answer · 2020-10-13T10:12:46.407

I rewrote your query and assume this will be faster:

SELECT u.id AS id14_, u.first_name AS first2_14_, u.last_name AS last3_14_, u.street_1 AS street4_14_, u.street_2 AS street5_14_, u.city AS city14_, u.us_state_id AS us7_14_, u.region AS region14_, u.country_id AS country9_14_, u.postal_code AS postal10_14_, u.user_name AS user11_14_, u.password AS password14_, u.profession AS profession14_, u.phone AS phone14_, u.url AS url14_, u.bio AS bio14_, u.last_login AS last17_14_, u.status AS status14_, u.birthdate AS birthdate14_, u.ageinyears AS ageinyears14_, u.deleted AS deleted14_, u.createdate AS createdate14_, u.audit AS audit14_, u.migrated2008 AS migrated24_14_, u.creator AS creator14_
FROM   dir_users u 
WHERE  u.status = 'active'
AND    u.deleted = FALSE
AND    EXISTS (
   SELECT 1
   FROM   dir_memberships m
   JOIN   dir_roles       r ON r.id = m.role
   JOIN   dir_groups      g ON g.id = m.group_id
   WHERE  m.group_id = 15499
   AND    m.user_id = u.id
   AND   (m.expires IS NULL
       OR m.expires > now() AND (m.startdate IS NULL OR m.startdate < now()))
   AND    m.deleted = FALSE
   AND    r.deleted = FALSE
   AND    r.name = 'ROLE_MEMBER'
   AND    g.deleted = FALSE
   )
AND    EXISTS (
    SELECT 1
    FROM   dir_memberships m
    JOIN   dir_roles       r ON r.id = m.role
    WHERE (m.expires IS NULL
        OR m.expires > now() AND (m.startDate IS NULL OR m.startDate < now()))
    AND    m.deleted = FALSE
    AND    m.user_id = u.id
    AND    r.name = 'ROLE_TEACHER_MEMBER'
    )

Rewrite with `EXISTS`

Replaced the weird case ... end = 1 expressions with simple expressions
Rewrote all JOINs with explicit join syntax to make it easier to read.
Transformed the big JOIN construct and the IN expression into two EXISTS semi-joins, which voids the necessity for DISTINCT. This should be quite a bit faster.
Lots of minor edits to make the query simpler, but they don't change the substance.
Especially use simper aliases - what you had was noisy and confusing.

Indexes

If this isn't fast enough yet, and your write performance can deal with more indexes, add this partial multi-column index:

CREATE INDEX dir_memberships_g_id_u_id_idx ON dir_memberships (group_id, user_id)
WHERE  deleted = FALSE;

The WHERE conditions have to match your query for the index to be useful!

I assume that you already have primary keys and indexes on relevant foreign keys.

Further:

CREATE INDEX dir_memberships_u_id_role_idx ON dir_memberships (user_id, role)
WHERE  deleted = FALSE;

Why user_id a second time?. See:

Also, since user_id is already used in another index you are not blocking HOT-updates (which can only be used with columns not involved in any indexes.

Why role?
I assume both columns are of type integer (4 bytes). I have seen in your detailed question, that you run a 64 bit OS where MAXALIGN 8 bytes, so another integer will not make the index grow at all. I threw in role which might be useful for the second EXISTS semi-join.

If you have many "dead" users, this might also help:

CREATE INDEX dir_users_id_idx ON dir_users (id)
WHERE status = 'active' AND deleted = FALSE;

As always, check with EXPLAIN to see whether the indexes actually get used. You wouldn't want useless indexes consuming resources.

Are we fast yet?

Of course, all the usual advice for performance optimization applies, too.

The weird `case` expressions are fairly typical in ORM as calculated fields. — sayap, Jul 25 '12 at 04:42
Wow! Thank you @Erwin for all that work. I hadn't considered rewriting the query yet since it is auto-generated by hibernate. It would be possible to drop down to sql in my app if needed, but if better indexes or vacuuming would help that would be a better solution I think. — atroutt, Jul 25 '12 at 14:08
Also, I had thought about adding a new index, but postgres doesn't allow indexes to have functions in it like now() ('ERROR: functions in index predicate must be marked IMMUTABLE '). Otherwise I was thinking something like dir_memberships_u_id_role_idx would help. — atroutt, Jul 25 '12 at 14:11
@atroutt: Oh, what an oversight on my side. Of course you cannot use `now()` in the index condition. I removed that condition, making the index less effective, but probably still useful. I have a similar setup where I recreate the index on a daily basis with the current date instead of now(). To make use of this, you have to add matching expressions to the query (in addition to the exact condition with `now()`. This is quite effective, but requires some overhead. Is your query still slow after you run `VACUUM FULL ANALYZE` on the involved tables? May be enough to fix autovacuum .. — Erwin Brandstetter, Jul 25 '12 at 14:44
@atroutt: Other than that, does the rewritten query work? I did that blindfolded without the database to test it. And how does it compare to the original? — Erwin Brandstetter, Jul 25 '12 at 14:47
@atroutt: From 11 minutes down to half a second. Nice. :) Does that include effects of `VACUUM`? If so, what is the runtime of your original query now? — Erwin Brandstetter, Jul 25 '12 at 15:10
@Erwin I haven't vacuumed yet--I don't want to do that on the production system without understanding how it would impact our app users, so I'm talking to the sysadmins about it first. — atroutt, Jul 25 '12 at 16:06
@atroutt: Thanks for the feedback. Goes to show how ORMs butcher performance. They are such crude crutches for more complex queries. — Erwin Brandstetter, Jul 25 '12 at 16:17
@atroutt: How did this pan out? Faster yet after vacuuming? Got auto-vacuum to work? — Erwin Brandstetter, Aug 06 '12 at 18:29
I confirmed that the auto-vacuum daemon is running. I tried manually vacuuming the tables and recreating the indexes that are relevant here like dir_memberships_role_group_id_index, but that didn't speed anything up significantly. I'm going to use your query instead @ErwinBrandstetter Thanks! — atroutt, Aug 14 '12 at 15:17

score 4 · Answer 2 · answered Jul 25 '12 at 05:09

4

The query, minus the last 4 conditions, i.e.

and group4_.deleted=false 
and user1_.STATUS='active' 
and user1_.deleted=false 
and (membership0_.USER_ID in (...))

returns 758 rows. Each of these 758 rows will then go through the select membership7_.USER_ID ... subquery, which takes 843.967 miliseconds to run.

843.967 * 758 = 639726.986, there goes the 10 minutes.

As for tuning the query, I don't think you need DIR_USERS user8_ in the subquery. You can start by removing it, and also changing the subquery to use EXISTS instead of IN.

By the way, is the database being vacuumed? Even without any tuning, it doesn't look that complex a query or that much of data to require 10 minutes.

answered Jul 25 '12 at 05:09

sayap

6,169
2
36
40

I looked it up in pg_stat_user_tables and it looks like the database has never been vacuumed. – atroutt Jul 25 '12 at 13:42
You may be interested in the [auto-vaccuum daemon](http://www.postgresql.org/docs/current/interactive/routine-vacuuming.html#AUTOVACUUM). It's on by default in modern versions of PostgreSQL. – Erwin Brandstetter Jul 25 '12 at 13:57
1

@atroutt: PostgreSQL 9.0 should do the vacuuming automatically. But you might want to consider an update to 9.0.8. There were several fixes regarding vacuum since 9.0.0 (your version) – Jul 25 '12 at 14:56
Yes, we verified that the autovacuum daemon is running and has vacuumed some tables. – atroutt Aug 14 '12 at 15:13

Slow PostgreSQL query in production - help me understand this explain analyze output

2 Answers2

Rewrite with `EXISTS`

Indexes

Linked

Slow PostgreSQL query in production - help me understand this explain analyze output

2 Answers2

Rewrite with EXISTS

Indexes

Linked

Rewrite with `EXISTS`