I'm currently working on a search feature that ends up hitting the db with a LIKE query. It used to be of the form
WHERE some_id = blah AND some_timestamp > blah AND (field1 LIKE '%some_text%' OR field2 LIKE '%some_text%' OR ...) ORDER BY some_timestamp DESC
.
This hasn't scaled well now that the table is in the size of tens of millions of rows, especially when it is filtered on a very old timestamp.
After some research it looked like a trigram index might be more performant for the text searching. So I added a trigram index on all of the text fields concatenated and got good results initially. After varying the new query though I found a regression. An old index (btree on some_id and some_timestamp DESC) was not being hit anymore. So the new text searching helps with certain text queries that used to be very slow, and other text queries that used to be very fast (a few ms) due to the btree index are now super slow (see below).
Is there a way to get the best of both worlds? Fast trigram text searching and fast btree indexing for the queries that need it?
Notes:
Postgres 11.6
I tried a btree_gin index to index the timestamp column as well but got pretty much the same performance.
I slightly modified my query (concatenated whitespace) to bypass the trigram index and verified the slow queries return to the btree index and <10ms execution times.
I tried some query rearrangement to try to get both indexes hit to no avail.
Table:
table1
---------------------------------
some_id | bigint
field1 | text
field2 | text
field3 | text
field4 | text
field5 | text
field6 | bigint
some_timestamp | timestamp without time zone
Trigram Index:
CREATE INDEX CONCURRENTLY IF NOT EXISTS trgm_idx ON table1 USING gin ((COALESCE(field1, '') || ' ' || COALESCE(field2, '') || COALESCE(field3, '') || ' ' || COALESCE(field4, '') || ' ' || COALESCE(field5, '') || ' ' || field6::text) gin_trgm_ops);
Query:
SELECT *
FROM table1 i
WHERE i.some_id = 1
AND (COALESCE(field1, '') || ' ' || COALESCE(field2, '') || COALESCE(field3, '') || ' ' || COALESCE(field4, '') || ' ' || COALESCE(field5, '') || ' ' || field6::text) ILIKE '%some_text%'
AND i.some_timestamp > '2015-01-00 00:00:00.0'
ORDER BY some_timestamp DESC limit 20;
Explain:
Limit (cost=1043.06..1043.11 rows=20 width=446) (actual time=37240.094..37240.099 rows=20 loops=1)
-> Sort (cost=1043.06..1043.15 rows=39 width=446) (actual time=37240.092..37240.095 rows=20 loops=1)
Sort Key: some_timestamp
Sort Method: top-N heapsort Memory: 36kB
-> Bitmap Heap Scan on table1 i (cost=345.01..1042.03 rows=39 width=446) (actual time=1413.415..37202.331 rows=83066 loops=1)
Recheck Cond: ((((((((((COALESCE(field1, ''::text) || ' '::text) || COALESCE(field2, ''::text)) || COALESCE(field3, ''::text)) || ' '::text) || COALESCE(field4, ''::text)) || ' '::text) || COALESCE(field5, ''::text)) || ' '::text) || (field6)::text) ~~* '%some_text%'::text)
Rows Removed by Index Recheck: 23
Filter: ((some_timestamp > '2015-01-00 00:00:00'::timestamp without time zone) AND (some_id = 1))
Rows Removed by Filter: 5746666
Heap Blocks: exact=395922
-> Bitmap Index Scan on trgm_idx (cost=0.00..345.00 rows=667 width=0) (actual time=1325.867..1325.867 rows=5833670 loops=1)
Index Cond: ((((((((((COALESCE(field1, ''::text) || ' '::text) || COALESCE(field2, ''::text)) || COALESCE(field3, ''::text)) || ' '::text) || COALESCE(field4, ''::text)) || ' '::text) || COALESCE(field5, ''::text)) || ' '::text) || (field6)::text) ~~* '%some_text%'::text)
Planning Time: 0.252 ms
Execution Time: 37243.205 ms
(14 rows)