3

I'm new to Postgres and I'm trying to understand indexes a bit more. I'm using version 12.5 and here is my code:

CREATE TABLE textfun(content TEXT);
CREATE UNIQUE INDEX text_b ON textfun(content);

INSERT INTO textfun (content)
SELECT (CASE WHEN (random()<=0.3) THEN 'https://mywebsite/nanana/'
WHEN (random()<=0.6) THEN 'https://mywebsite/friendy/'
ELSE 'https://mywebsite/mina/' END) || generate_series(1000000,2000000);

Here, I created a million records hoping to see the effect of indexing.

When I try to get a query plan:

explain analyze
SELECT content FROM textfun WHERE content LIKE 'mina%'; 

I get this back:

Gather  (cost=1000.00..14300.34 rows=100 width=32) (actual time=77.574..80.054 rows=0 loops=1)
  Workers Planned: 2    
  Workers Launched: 2
  Parallel Seq Scan on textfun  (cost=0.00..13290.34 rows=42 width=32) (actual time=69.022..69.022 rows=0 loops=3)
     Filter: (content ~~ 'mina%'::text)
     Rows Removed by Filter: 333334  
Planning Time: 0.254 ms  
Execution Time: 80.071 ms 
(8 rows)

I expected a Parallel Index Scan.

I tried:

explain analyze
SELECT content FROM textfun WHERE content LIKE '1500000%';

and:

explain analyze
SELECT content FROM textfun WHERE content LIKE '%mina';

but both give me a sequential scan plan.

Is there a detail that I'm missing here, why am I not getting an Index Scan?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Israel Phiri
  • 109
  • 1
  • 11
  • 1
    Please do not include the execution plan as a "quote". The indention is vital for reading and understanding the plan –  Jul 14 '21 at 20:35

2 Answers2

5

To support a LIKE condition you need to create the index using the text_pattern_ops

CREATE UNIQUE INDEX text_b ON textfun(content text_pattern_ops);

With that, the result is the following execution plan:

Bitmap Heap Scan on textfun  (cost=191.68..7654.53 rows=5000 width=32) (actual time=2.553..2.554 rows=0 loops=1)
  Filter: (content ~~ '1500000%'::text)
  ->  Bitmap Index Scan on text_b  (cost=0.00..190.43 rows=5000 width=0) (actual time=2.550..2.550 rows=0 loops=1)
        Index Cond: ((content ~>=~ '1500000'::text) AND (content ~<~ '1500001'::text))
Planning Time: 6.247 ms
Execution Time: 6.809 ms

Online example

4

Correctness first. Your predicate will not find 'https://mywebsite/mina/':

content LIKE 'mina%'

One of these would work:

content LIKE '%mina%'
content ~ 'mina'

But neither can be supported with a btree index. A trigram index would do that:

CREATE INDEX ON textfun USING gin (content gin_trgm_ops);

See:

Or maybe a text search index (after separating words in your URIs ...)

But a GIN index cannot enforce uniqueness. You might need a second B-tree index. (Or constraint that implies such an index).

To actually support your original, left-anchored predicate, I would use a B-tree index with COLLATE "C":

CREATE UNIQUE INDEX text_b ON textfun(content COLLATE "C");

Per-column collation support added with Postgres 9.1 largely obsoleted the old xxx_pattern_ops operator classes. See:

db<>fiddle here - with COLLATE "C", and fewer rows (enough to proof the point).

Here is a comprehensive overview of pattern matching options in Postgres:


Aside from that, your test case with only three distinct values is not very useful. Postgres will typically not use any indexes at all when it expects to fetch more than a few percent of all rows, because a sequential scan is typically faster for that. And it will know about the few "most common values" as it keeps statistics updated by ANALYZE (or autovacuum by default).

Searching for content LIKE '1500000%' is different in that respect, as Postgres will know that it's not common and use an applicable index ...

Index optimization depends on the complete picture: environment and requirements ...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228