PostgreSQL increase group by over 30 million rows

Question

Is there any way to increase speed of dynamic group by query ? I have a table with 30 million rows.

create table if not exists tb
(
    id serial not null constraint tb_pkey primary key,
    week integer,
    month integer,
    year integer,
    starttime varchar(20),
    endtime varchar(20),
    brand smallint,
    category smallint,
    value real
);

The query below takes 8.5 seconds.

SELECT category from tb group by category

Is there any way to increase the speed. I have tried with and without index.

AdamKG · Answer 1 · 2018-11-10T21:13:51.037

For that exact query, not really; doing this operation requires scanning every row. No way around it.

But if you're looking to be able to quickly get the set of unique categories, and you have an index on that column, you can use a variation of the WITH RECURSIVE example shown in the edit to the question here (look towards the end of the question):

Counting distinct rows using recursive cte over non-distinct index

You'll need to change it to return the set of unique values instead of counting them, but it looks like a simple change:

testdb=# create table tb(id bigserial, category smallint);
CREATE TABLE
testdb=# insert into tb(category) select 2 from generate_series(1, 10000)
testdb-# ;
INSERT 0 10000
testdb=# insert into tb(category) select 1 from generate_series(1, 10000);
INSERT 0 10000
testdb=# insert into tb(category) select 3 from generate_series(1, 10000);
INSERT 0 10000
testdb=# create index on tb(category);
CREATE INDEX
testdb=# WITH RECURSIVE cte AS
  (
     (SELECT category
      FROM tb
      WHERE category >= 0
      ORDER BY 1
      LIMIT 1)
   UNION ALL SELECT
     (SELECT category
      FROM tb
      WHERE category > c.category
      ORDER BY 1
      LIMIT 1)
   FROM cte c
   WHERE category IS NOT NULL)
SELECT category
FROM cte
WHERE category IS NOT NULL;
 category 
----------
        1
        2
        3
(3 rows)

And here's the EXPLAIN ANALYZE:

    QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on cte  (cost=40.66..42.68 rows=100 width=2) (actual time=0.057..0.127 rows=3 loops=1)
   Filter: (category IS NOT NULL)
   Rows Removed by Filter: 1
   CTE cte
     ->  Recursive Union  (cost=0.29..40.66 rows=101 width=2) (actual time=0.052..0.119 rows=4 loops=1)
           ->  Limit  (cost=0.29..0.33 rows=1 width=2) (actual time=0.051..0.051 rows=1 loops=1)
                 ->  Index Only Scan using tb_category_idx on tb tb_1  (cost=0.29..1363.29 rows=30000 width=2) (actual time=0.050..0.050 rows=1 loops=1)
                       Index Cond: (category >= 0)
                       Heap Fetches: 1
           ->  WorkTable Scan on cte c  (cost=0.00..3.83 rows=10 width=2) (actual time=0.015..0.015 rows=1 loops=4)
                 Filter: (category IS NOT NULL)
                 Rows Removed by Filter: 0
                 SubPlan 1
                   ->  Limit  (cost=0.29..0.36 rows=1 width=2) (actual time=0.016..0.016 rows=1 loops=3)
                         ->  Index Only Scan using tb_category_idx on tb  (cost=0.29..755.95 rows=10000 width=2) (actual time=0.015..0.015 rows=1 loops=3)
                               Index Cond: (category > c.category)
                               Heap Fetches: 2
 Planning time: 0.224 ms
 Execution time: 0.191 ms
(19 rows)

The number of loops it has to do the WorkTable scan node will be equal to the number of unique categories you have plus one, so it should stay very fast up to, say, hundreds of unique values.

Another route you can take is to add another table where you just store unique values of tb.category and have application logic check that table and insert their value when updating/inserting that column. This can also be done database-side with triggers; that solution is also discussed in the answers to the linked question.

PostgreSQL increase group by over 30 million rows

1 Answers1