Attempt to avoid duplicated code with CTE takes far too long

Question

I need to create a view with some calculated and aggregated values. So I need certain values multiple times, like total_dist_pts in the example below.

There is a loc_a_run table with about 350 rows so far (constantly growing) and a loc_a_pline table with somewhat more than 4 million rows (also growing):

prod=# \d loc_a_run
                                         Table "public.loc_a_run"
      Column    |           Type           | Collation | Nullable |           Default
----------------+--------------------------+-----------+----------+-----------------------------
 loc_a_run_id   | integer                  |           | not null | generated always as identity
 execution_time | timestamp with time zone |           | not null | CURRENT_TIMESTAMP(0)
 run_type_id    | smallint                 |           | not null |
 ...
 has_errors     | boolean                  |           |          |
Indexes:
    "loc_a_run_pkey" PRIMARY KEY, btree (loc_a_run_id)


prod=# \d loc_a_pline
                                     Table "public.loc_a_pline"
      Column    |    Type   | Collation | Nullable |           Default
----------------+-----------+-----------+----------+-----------------------------
 loc_a_pline_id | bigint    |           | not null | generated always as identity
 loc_a_run_id   | integer   |           | not null |
 is_top         | boolean   |           | not null |
 ...
 dist_left      | numeric   |           | not null |
 dist_right     | numeric   |           | not null |
 ...
Indexes:
    "loc_a_pline_pkey" PRIMARY KEY, btree (loc_a_pline_id)
Foreign-key constraints:
    "loc_a_pline_loc_a_run_id_fkey" FOREIGN KEY (loc_a_run_id) REFERENCES loc_a_run(loc_a_run_id) ON UPDATE CASCADE ON DELETE CASCADE

The solution I use right now:

SELECT run.loc_a_run_id AS run_id
     , run_type.run_type
     , SUM(
          CASE
            WHEN pline.is_top IS true
            THEN ROUND(pline.dist_right - pline.dist_left, 2)
            ELSE ROUND(pline.dist_left - pline.dist_right, 2)
          END)
       AS total_dist_pts
     , COUNT(pline.loc_a_pline_id) AS total_plines
     , SUM(
          CASE
            WHEN pline.is_top IS true
            THEN ROUND(pline.dist_right - pline.dist_left, 2)
            ELSE ROUND(pline.dist_left - pline.dist_right, 2)
          END)
       / COUNT(pline.loc_a_pline_id)
       AS dist_pts_per_pline

FROM  loc_a_run AS run
JOIN  loc_a_pline AS pline USING (loc_a_run_id)
JOIN  run_type USING (run_type_id)
WHERE run.has_errors IS false
GROUP BY run_id, run_type;

Query plan:

"Finalize GroupAggregate  (cost=154201.17..154577.71 rows=1365 width=108)"
"  Group Key: run.loc_a_run_id, run_type.run_type"
"  ->  Gather Merge  (cost=154201.17..154519.69 rows=2730 width=76)"
"        Workers Planned: 2"
"        ->  Sort  (cost=153201.15..153204.56 rows=1365 width=76)"
"              Sort Key: run.loc_a_run_id, run_type.run_type"
"              ->  Partial HashAggregate  (cost=153113.01..153130.07 rows=1365 width=76)"
"                    Group Key: run.loc_a_run_id, run_type.run_type"
"                    ->  Hash Join  (cost=21.67..120633.75 rows=1623963 width=62)"
"                          Hash Cond: (run.run_type_id = run_type.run_type_id)"
"                          ->  Hash Join  (cost=20.55..112756.41 rows=1623963 width=32)"
"                                Hash Cond: (pline.loc_a_run_id = run.loc_a_run_id)"
"                                ->  Parallel Seq Scan on loc_a_pline pline  (cost=0.00..107766.55 rows=1867855 width=30)"
"                                ->  Hash  (cost=17.14..17.14 rows=273 width=6)"
"                                      ->  Seq Scan on loc_a_run run  (cost=0.00..17.14 rows=273 width=6)"
"                                            Filter: (has_errors IS FALSE)"
"                          ->  Hash  (cost=1.05..1.05 rows=5 width=34)"
"                                ->  Seq Scan on loc_a_run_type run_type  (cost=0.00..1.05 rows=5 width=34)"

This takes around 14.2s to execute. I lack the experience to assess how good or bad the performance is for this part, but I could live with it. Of course, faster would be an advantage.

Because this contains duplicated code I tried to get rid of it by using a CTE (in the final view I need this for a few more calculations, but the pattern is the same):

WITH dist_pts AS (
    SELECT  loc_a_run_id
          , CASE
                WHEN is_top IS true
                THEN ROUND(dist_right - dist_left, 2)
                ELSE ROUND(dist_left - dist_right, 2)
            END AS pts
    FROM loc_a_pline
)

SELECT run.loc_a_run_id AS run_id
     , run_type.run_type
     , SUM(dist_pts.pts) AS total_dist_pts
     , COUNT(pline.loc_a_pline_id) AS total_plines
     , SUM(dist_pts.pts) / COUNT(pline.loc_a_pline_id) AS dist_pts_per_pline
FROM   loc_a_run AS run
JOIN   dist_pts USING (loc_a_run_id)
JOIN   loc_a_pline AS pline USING (loc_a_run_id)
JOIN   run_type USING (run_type_id)
WHERE  run.has_errors IS false
GROUP BY run_id, run_type;

Query plan:

"Finalize GroupAggregate  (cost=575677889.59..575678266.13 rows=1365 width=108)"
"  Group Key: run.loc_a_run_id, run_type.run_type"
"  ->  Gather Merge  (cost=575677889.59..575678208.12 rows=2730 width=76)"
"        Workers Planned: 2"
"        ->  Sort  (cost=575676889.57..575676892.98 rows=1365 width=76)"
"              Sort Key: run.loc_a_run_id, run_type.run_type"
"              ->  Partial HashAggregate  (cost=575676801.43..575676818.49 rows=1365 width=76)"
"                    Group Key: run.loc_a_run_id, run_type.run_type"
"                    ->  Parallel Hash Join  (cost=155366.81..111024852.15 rows=23232597464 width=62)"
"                          Hash Cond: (loc_a_pline.loc_a_run_id = run.loc_a_run_id)"
"                          ->  Parallel Seq Scan on loc_a_pline  (cost=0.00..107877.85 rows=1869785 width=22)"
"                          ->  Parallel Hash  (cost=120758.30..120758.30 rows=1625641 width=48)"
"                                ->  Hash Join  (cost=21.67..120758.30 rows=1625641 width=48)"
"                                      Hash Cond: (run.run_type_id = run_type.run_type_id)"
"                                      ->  Hash Join  (cost=20.55..112872.83 rows=1625641 width=18)"
"                                            Hash Cond: (pline.loc_a_run_id = run.loc_a_run_id)"
"                                            ->  Parallel Seq Scan on loc_a_pline pline  (cost=0.00..107877.85 rows=1869785 width=12)"
"                                            ->  Hash  (cost=17.14..17.14 rows=273 width=6)"
"                                                  ->  Seq Scan on loc_a_run run  (cost=0.00..17.14 rows=273 width=6)"
"                                                        Filter: (has_errors IS FALSE)"
"                                      ->  Hash  (cost=1.05..1.05 rows=5 width=34)"
"                                            ->  Seq Scan on loc_a_run_type run_type  (cost=0.00..1.05 rows=5 width=34)"

This takes forever and seems to be the wrong approach. I struggle to understand the query plan to find my mistake(s).

So my questions are:

Why does the CTE approach take so much time?
What would be the smartest solution to avoid duplicated code and eventually reduce execution time?
Is there a way to SUM(dist_pts.pts) only once?
Is there a way to COUNT(pline.loc_a_pline_id) in the same go as the subtraction in the CTE instead of accessing the big loc_a_pline table again? (is it accessed again at all?)

Any help is highly appreciated

@jjanes: PostgreSQL 14.2, compiled by Visual C++ build 1914, 64-bit — nadine, Sep 13 '22 at 19:27

J Spratt · Accepted Answer · 2022-09-15T03:19:01.460

1

Consider creating an index on loc_a_pline.loc_a_run_id. Postgres doesn't automatically create indexes on the referencing side of FK relationships. That should drastically improve the speed and remove the sequential scans over loc_a_pline from the execution plan.

Additionally, I'd suggest gathering all the data you want in the first portion of the CTE and then separating the aggregates out into their own portion. Something like this that accesses all of the tables once and aggregates over the set once:

WITH dist_pts AS (
    SELECT run.loc_a_run_id rid
         , pline.loc_a_pline_id pid
         , rt.run_type
         , CASE
               WHEN pline.is_top IS true
               THEN ROUND(pline.dist_right - pline.dist_left, 2)
               ELSE ROUND(pline.dist_left - pline.dist_right, 2)
           END AS pts
      FROM loc_a_run run
      JOIN loc_a_pline pline ON run.loc_a_run_id = pline.loc_a_run_id
      JOIN run_type rt ON run.run_type_id = rt.run_type_id
     WHERE run.has_errors IS FALSE
), aggs AS
(
    SELECT SUM(dist_pts.pts)::NUMERIC total_dist_pts
         , COUNT(dist_pts.pid)::NUMERIC total_plines
         , dist_pts.rid
      FROM dist_pts
     GROUP BY dist_pts.rid
)
SELECT dp.rid
     , dp.run_type
     , aggs.total_dist_pts
     , aggs.total_plines
     , (aggs.total_dist_pts / aggs.total_plines) dist_pts_per_pline
  FROM dist_pts dp
  JOIN aggs ON dp.rid = aggs.rid
 GROUP BY dp.rid, dp.run_type, aggs.total_dist_pts, aggs.total_plines
;

edited Sep 15 '22 at 03:19

answered Sep 13 '22 at 17:26

J Spratt

1,762
1
11
22

Thanks a lot for your answer. Your approach looks so much cleaner and executes faster (took ~10s without index). I added `aggs.total_dist_pts, aggs.total_plines` to the `GROUP BY` clause. But after creating an index on `loc_a_pline.loc_a_run_id`, it's still doing the seq scan on `loc_a_pline`. Do you have an idea why the index is not used? I did `CREATE INDEX idx_loc_a_pline_loc_a_run_id ON loc_a_pline (loc_a_run_id)` – nadine Sep 13 '22 at 21:36
Now an index scan is performed without me changing anything. Maybe I didn't wait long enough. But the execution time ist still around 10s. – nadine Sep 14 '22 at 06:17
You're welcome. I thought you might find it helpful. It seems that you're running against all of the data possible so after creating that index, you might not see an immediate difference. If you were to limit the set to a specific `loc_a_run_id` (`WHERE loc_a_run_id = N`), you would see major improvements. Imagine trying to limit the dataset by providing that ID and then still having to sequentially scan ALL of `loc_a_pline` to find the join matches. If you're curious, you should give that a try with and without the index and observe the performance gain. – J Spratt Sep 14 '22 at 14:14
You would apply that `WHERE` condition in the first block of the CTE. – J Spratt Sep 14 '22 at 14:17
It makes sense that the index gets more powerful when a smaller subset is selected. I have completely neglected the index topic and will read through the documentation and play with it. If I select all `pline` records of one `pline.loc_a_run_id`, it only takes a few tenths of a second. Thank you for the hint. – nadine Sep 14 '22 at 20:05
Regarding the type cast, the columns `pline.dist_right` and `pline.dist_left` are of type numeric. I understand, that the result of `COUNT` has to be casted to a decimal data type, because it returns an integer. So it should be fine, if just the `aggs.total_plines` gets casted to numeric, because `ROUND` returns numeric, right? Is there an advantage of casting both, `aggs.total_dist_pts` and `aggs.total_plines` to double precision? – nadine Sep 14 '22 at 20:06
You can use `NUMERIC`/`DECIMAL`, `DOUBLE PRECISION`, or `REAL` depending on how much precision you're looking for. `NUMERIC` is probably best for exactness but the tradeoff is variable storage size. The comparisons related to storage size are really getting into the nitty gritty and I'll leave it to you to research that. I used double because that's what I use most commonly. You're right, you can just cast the result of `COUNT()` and prevent integer division. `SUM()` returns the same type as the values it's operating on. I'd just cast both to be explicit. – J Spratt Sep 14 '22 at 20:37
The nitty gritty of storage size is called "alignment padding". You can start here https://stackoverflow.com/a/7431468/9351817. If you haven't heard of this book on indexes and performance (SQL Performance Explained by Markus Winand), I highly recommend it to all DB beginners/intermediates. You can read it for free here too: https://use-the-index-luke.com/ – J Spratt Sep 14 '22 at 20:41

Attempt to avoid duplicated code with CTE takes far too long

1 Answers1