I am quite new to optimizing the speed of a select, but I have the one below which is time consuming. I would be grateful for suggestions to improve performance.
SELECT DISTINCT p.id "pub_id",
p.submission_year,
pip.level,
mv_s.integer_value "total_citations",
1 "count_pub"
FROM publication p
JOIN organisation_association "oa" ON (oa.publication_id = p.id
AND oa.organisation_id IN (249189578,
249189824))
JOIN bfi_2017 "pip" ON (p.uuid = pip.uuid
AND pip.bfi_score > 0
AND pip.bfi_score IS NOT NULL)
LEFT JOIN metric_value mv_s ON (mv_s.name = 'citations'
AND EXISTS
(SELECT *
FROM publication_metrics pm_s
JOIN metrics m_s ON (m_s.id = pm_s.metrics_id
AND m_s.source_id = 210247389
AND pm_s.publication_id = p.id
AND mv_s.metrics_id = m_s.id)))
WHERE p.peer_review = 'true'
AND (p.type_classification_id IN (57360320,
57360322,
57360324,
57360326,
57360350))
AND p.submission_year = 2017
Execute plan:
"Unique (cost=532129954.32..532286422.32 rows=4084080 width=24) (actual time=1549616.424..1549616.582 rows=699 loops=1)"
" Buffers: shared read=27411, temp read=1774656 written=2496"
" -> Sort (cost=532129954.32..532169071.32 rows=15646800 width=24) (actual time=1549616.422..1549616.445 rows=712 loops=1)"
" Sort Key: p.id, pip.level, mv_s.integer_value"
" Sort Method: quicksort Memory: 80kB"
" Buffers: shared read=27411, temp read=1774656 written=2496"
" -> Nested Loop Left Join (cost=393.40..529618444.45 rows=15646800 width=24) (actual time=1832.122..1549614.196 rows=712 loops=1)"
" Join Filter: (SubPlan 1)"
" Rows Removed by Join Filter: 607313310"
" Buffers: shared read=27411, temp read=1774656 written=2496"
" -> Nested Loop (cost=393.40..8704.01 rows=37 width=16) (actual time=5.470..125.773 rows=712 loops=1)"
" Buffers: shared hit=20313 read=4585"
" -> Hash Join (cost=392.97..7886.65 rows=72 width=16) (actual time=5.160..77.182 rows=3417 loops=1)"
" Hash Cond: ((p.uuid)::text = (pip.uuid)::text)"
" Buffers: shared hit=2 read=3670"
" -> Bitmap Heap Scan on publication p (cost=160.30..7643.44 rows=2618 width=49) (actual time=2.335..67.546 rows=4527 loops=1)"
" Recheck Cond: (submission_year = 2017)"
" Filter: (peer_review AND (type_classification_id = ANY ('{57360320,57360322,57360324,57360326,57360350}'::bigint[])))"
" Rows Removed by Filter: 3975"
" Heap Blocks: exact=3556"
" Buffers: shared hit=2 read=3581"
" -> Bitmap Index Scan on idx_in2ix3rvuzxxf76bsipgn4l4sy (cost=0.00..159.64 rows=8430 width=0) (actual time=1.784..1.784 rows=8502 loops=1)"
" Index Cond: (submission_year = 2017)"
" Buffers: shared read=27"
" -> Hash (cost=181.61..181.61 rows=4085 width=41) (actual time=2.787..2.787 rows=4085 loops=1)"
" Buckets: 4096 Batches: 1 Memory Usage: 324kB"
" Buffers: shared read=89"
" -> Seq Scan on bfi_2017 pip (cost=0.00..181.61 rows=4085 width=41) (actual time=0.029..2.034 rows=4085 loops=1)"
" Filter: ((bfi_score IS NOT NULL) AND (bfi_score > '0'::double precision))"
" Rows Removed by Filter: 3324"
" Buffers: shared read=89"
" -> Index Only Scan using org_ass_publication_idx on organisation_association oa (cost=0.43..11.34 rows=1 width=8) (actual time=0.011..0.012 rows=0 loops=3417)"
" Index Cond: ((publication_id = p.id) AND (organisation_id = ANY ('{249189578,249189824}'::bigint[])))"
" Heap Fetches: 712"
" Buffers: shared hit=20311 read=915"
" -> Materialize (cost=0.00..53679.95 rows=845773 width=12) (actual time=0.012..93.456 rows=852969 loops=712)"
" Buffers: shared read=20873, temp read=1774656 written=2496"
" -> Seq Scan on metric_value mv_s (cost=0.00..45321.09 rows=845773 width=12) (actual time=0.043..470.590 rows=852969 loops=1)"
" Filter: ((name)::text = 'citations'::text)"
" Rows Removed by Filter: 1102878"
" Buffers: shared read=20873"
" SubPlan 1"
" -> Nested Loop (cost=0.85..16.91 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=607313928)"
" Buffers: shared read=1953"
" -> Index Scan using idx_w4wbsbxcqvjmqu64ubjlmqywdy on publication_metrics pm_s (cost=0.43..8.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=607313928)"
" Index Cond: (metrics_id = mv_s.metrics_id)"
" Filter: (publication_id = p.id)"
" Rows Removed by Filter: 1"
" -> Index Scan using metrics_pkey on metrics m_s (cost=0.43..8.45 rows=1 width=8) (actual time=0.027..0.027 rows=0 loops=3108)"
" Index Cond: (id = mv_s.metrics_id)"
" Filter: (source_id = 210247389)"
" Rows Removed by Filter: 1"
" Buffers: shared hit=10496 read=1953"
"Planning Time: 1.833 ms" "Execution Time: 1549621.523 ms"