I have a table with 2.2 Million rows.
Table "public.index"
Column | Type | Modifiers
-----------+-----------------------------+-----------------------------------------------------
fid | integer | not null default nextval('index_fid_seq'::regclass)
location | character varying |
Indexes:
"index_pkey" PRIMARY KEY, btree (fid)
"location_index" btree (location text_pattern_ops)
The location is the full path to a file, but I need to query using the name of the folder the file is located in. That folder name is unique in the table.
To avoid %
at the beginning, I search for the full path which I know:
select fid from index where location like '/path/to/folder/%'
Explain Analyze:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on index (cost=0.00..120223.34 rows=217 width=4) (actual time=1181.701..1181.701 rows=0 loops=1)
Filter: ((location)::text ~~ '/path/to/folder/%'::text)
Rows Removed by Filter: 2166034
Planning time: 0.954 ms
Execution time: 1181.748 ms
(5 rows)
The question is not how to make a workaround, because I have found that for my case:
When creating a foldername_index
create index on index (substring(location, '(?<=/path/to/)[^\/]*');
I can succesfully use the folder_name to query:
explain analyze select fid from index where substring(location, '(?<=/path/to/)[^\/]*') = 'foldername';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on index (cost=600.49..31524.74 rows=10830 width=12) (actual time=0.030..0.030 rows=1 loops=1)
Recheck Cond: ("substring"((location)::text, '(?<=/path/to/)[^\/]*'::text) = 'folder_name'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on foldername_index (cost=0.00..597.78 rows=10830 width=0) (actual time=0.023..0.023 rows=1 loops=1)
Index Cond: ("substring"((location)::text, '(?<=/path/to/)[^\/]*'::text) = 'folder_name'::text)
Planning time: 0.115 ms
Execution time: 0.059 ms
(7 rows)
I have followed the PostgreSQL FAQ:
When using wild-card operators such as LIKE or ~, indexes can only be used in certain circumstances:
The beginning of the search string must be anchored to the start of the string, i.e.
LIKE patterns must not start with % or _.
The search string can not start with a character class, e.g. [a-e].
Everyting not the case in my query.
C locale must be used during initdb because sorting in a non-C locale often doesn't match the behavior of LIKE. You can create a special text_pattern_ops index that will work in such cases, but note it is only helpful for LIKE indexing.
I have C Locale:
# show LC_COLLATE;
lc_collate
------------
C
(1 row)
I also followed the instructions from this great answer here on Stack Overflow, which is why I use text_pattern_ops
which did not change anything. Unfortunately, I cannot install new modules.
So: Why does my query perform a seq scan?