0

I am quite new to optimizing the speed of a select, but I have the one below which is time consuming. I would be grateful for suggestions to improve performance.

SELECT DISTINCT p.id "pub_id",
            p.submission_year,
            pip.level,
            mv_s.integer_value "total_citations",
            1 "count_pub"
FROM publication p
JOIN organisation_association "oa" ON (oa.publication_id = p.id
                                   AND oa.organisation_id IN (249189578,
                                                              249189824))
JOIN bfi_2017 "pip" ON (p.uuid = pip.uuid
                    AND pip.bfi_score > 0
                    AND pip.bfi_score IS NOT NULL)
LEFT JOIN metric_value mv_s ON (mv_s.name = 'citations'
                            AND EXISTS
                              (SELECT *
                               FROM publication_metrics pm_s
                               JOIN metrics m_s ON (m_s.id = pm_s.metrics_id
                                                    AND m_s.source_id = 210247389
                                                    AND pm_s.publication_id = p.id
                                                    AND mv_s.metrics_id = m_s.id)))
WHERE p.peer_review = 'true'
  AND (p.type_classification_id IN (57360320,
                                57360322,
                                57360324,
                                57360326,
                                57360350))
  AND p.submission_year = 2017

Execute plan:

"Unique  (cost=532129954.32..532286422.32 rows=4084080 width=24) (actual time=1549616.424..1549616.582 rows=699 loops=1)"
"  Buffers: shared read=27411, temp read=1774656 written=2496"
"  ->  Sort  (cost=532129954.32..532169071.32 rows=15646800 width=24) (actual time=1549616.422..1549616.445 rows=712 loops=1)"
"        Sort Key: p.id, pip.level, mv_s.integer_value"
"        Sort Method: quicksort  Memory: 80kB"
"        Buffers: shared read=27411, temp read=1774656 written=2496"
"        ->  Nested Loop Left Join  (cost=393.40..529618444.45 rows=15646800 width=24) (actual time=1832.122..1549614.196 rows=712 loops=1)"
"              Join Filter: (SubPlan 1)"
"              Rows Removed by Join Filter: 607313310"
"              Buffers: shared read=27411, temp read=1774656 written=2496"
"              ->  Nested Loop  (cost=393.40..8704.01 rows=37 width=16) (actual time=5.470..125.773 rows=712 loops=1)"
"                    Buffers: shared hit=20313 read=4585"
"                    ->  Hash Join  (cost=392.97..7886.65 rows=72 width=16) (actual time=5.160..77.182 rows=3417 loops=1)"
"                          Hash Cond: ((p.uuid)::text = (pip.uuid)::text)"
"                          Buffers: shared hit=2 read=3670"
"                          ->  Bitmap Heap Scan on publication p  (cost=160.30..7643.44 rows=2618 width=49) (actual time=2.335..67.546 rows=4527 loops=1)"
"                                Recheck Cond: (submission_year = 2017)"
"                                Filter: (peer_review AND (type_classification_id = ANY ('{57360320,57360322,57360324,57360326,57360350}'::bigint[])))"
"                                Rows Removed by Filter: 3975"
"                                Heap Blocks: exact=3556"
"                                Buffers: shared hit=2 read=3581"
"                                ->  Bitmap Index Scan on idx_in2ix3rvuzxxf76bsipgn4l4sy  (cost=0.00..159.64 rows=8430 width=0) (actual time=1.784..1.784 rows=8502 loops=1)"
"                                      Index Cond: (submission_year = 2017)"
"                                      Buffers: shared read=27"
"                          ->  Hash  (cost=181.61..181.61 rows=4085 width=41) (actual time=2.787..2.787 rows=4085 loops=1)"
"                                Buckets: 4096  Batches: 1  Memory Usage: 324kB"
"                                Buffers: shared read=89"
"                                ->  Seq Scan on bfi_2017 pip  (cost=0.00..181.61 rows=4085 width=41) (actual time=0.029..2.034 rows=4085 loops=1)"
"                                      Filter: ((bfi_score IS NOT NULL) AND (bfi_score > '0'::double precision))"
"                                      Rows Removed by Filter: 3324"
"                                      Buffers: shared read=89"
"                    ->  Index Only Scan using org_ass_publication_idx on organisation_association oa  (cost=0.43..11.34 rows=1 width=8) (actual time=0.011..0.012 rows=0 loops=3417)"
"                          Index Cond: ((publication_id = p.id) AND (organisation_id = ANY ('{249189578,249189824}'::bigint[])))"
"                          Heap Fetches: 712"
"                          Buffers: shared hit=20311 read=915"
"              ->  Materialize  (cost=0.00..53679.95 rows=845773 width=12) (actual time=0.012..93.456 rows=852969 loops=712)"
"                    Buffers: shared read=20873, temp read=1774656 written=2496"
"                    ->  Seq Scan on metric_value mv_s  (cost=0.00..45321.09 rows=845773 width=12) (actual time=0.043..470.590 rows=852969 loops=1)"
"                          Filter: ((name)::text = 'citations'::text)"
"                          Rows Removed by Filter: 1102878"
"                          Buffers: shared read=20873"
"              SubPlan 1"
"                ->  Nested Loop  (cost=0.85..16.91 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=607313928)"
"                      Buffers: shared read=1953"
"                      ->  Index Scan using idx_w4wbsbxcqvjmqu64ubjlmqywdy on publication_metrics pm_s  (cost=0.43..8.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=607313928)"
"                            Index Cond: (metrics_id = mv_s.metrics_id)"
"                            Filter: (publication_id = p.id)"
"                            Rows Removed by Filter: 1"
"                      ->  Index Scan using metrics_pkey on metrics m_s  (cost=0.43..8.45 rows=1 width=8) (actual time=0.027..0.027 rows=0 loops=3108)"
"                            Index Cond: (id = mv_s.metrics_id)"
"                            Filter: (source_id = 210247389)"
"                            Rows Removed by Filter: 1"
"                            Buffers: shared hit=10496 read=1953"

"Planning Time: 1.833 ms" "Execution Time: 1549621.523 ms"

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
pmelch
  • 101
  • 1
  • 3
  • 9
  • Is there an index on `pm_s.publication_id`? – jjanes Jun 10 '20 at 13:32
  • If I run the query mentioned here: [link] https://stackoverflow.com/questions/45983169/checking-for-existence-of-index-in-postgresql I get: index `publication_metrics_pkey` in column `publication_id` for table `publication_metrics` – pmelch Jun 12 '20 at 13:48

0 Answers0