Are there any additional steps I can take to speed up query execution?
I have a table with more than 100m rows and I need to do search for matching strings. For that I checked two options:
- Compare text with to_tsvector
@@
(to_tsquery or plainto_tsquery)
This works very fast (under 1s on all data) but it has some problems with finding text similarity - Compare text with pg_trgm similarity This works fine on text comparison but works bad on large amount of data.
I found that I can use indexes to improve performance.
For my GiST index I tried to increase siglen
from small number to 2024, but for some reason Postgres uses 512
and not higher.
CREATE INDEX trgm_idx_512_gg ON table USING GIST (name gist_trgm_ops(siglen=512));
Query:
SELECT name, similarity(name, 'ноутбук MSI GF63 Thin 10SC 086XKR 9S7 16R512 086') as sm
FROM table
WHERE name % 'ноутбук MSI GF63 Thin 10SC 086XKR 9S7 16R512 086'
EXPLAIN
output:
Bitmap Heap Scan on table (cost=1632.01..40051.57 rows=9737 width=126)
Recheck Cond: ((name)::text % 'ноутбук MSI GF63 Thin 10SC 086XKR 9S7 16R512 086'::text)
-> Bitmap Index Scan on trgm_idx_512_gg (cost=0.00..1629.57 rows=9737 width=0)
Index Cond: ((name)::text % 'ноутбук MSI GF63 Thin 10SC 086XKR 9S7 16R512 086'::text)
Execution time was about 120 sec.
Question
How can I improve or speed up query? Maybe I need to use a different approach or just add something else?
Output for EXPLAIN (ANALYZE, BUFFERS)
(searching for a different name so that the search is completely new and not from the cache):
Bitmap Heap Scan on table (cost=1632.01..40051.57 rows=9737 width=126) (actual time=159119.258..159960.251 rows=5645 loops=1)
Recheck Cond: ((name)::text % 'Чехол на realme C25s / Реалми Ц25с c рисунком / прозрачный с принтом, Andy&Paul'::text)
Heap Blocks: exact=3795
Buffers: shared read=1289378
-> Bitmap Index Scan on trgm_idx_512_gg (cost=0.00..1629.57 rows=9737 width=0) (actual time=159118.616..159118.616 rows=5645 loops=1)
Index Cond: ((name)::text % 'Чехол на realme C25s / Реалми Ц25с c рисунком / прозрачный с принтом, Andy&Paul'::text)
Buffers: shared read=1285583
Planning:
Buffers: shared read=5
Planning Time: 4.063 ms
Execution Time: 159961.121 ms
I also created a GIN index (but Postgres kept using the GiST):
CREATE INDEX gin_gg ON table USING GIN (name gin_trgm_ops);
Size: 12 GB.
GIST index: 31GB