Pattern matching performance issue Postgres

Question

I found the query like below taking longer time as this pattern matching causes the performance in my batch job,

Query:

select a.id, b.code
from table a
left join table b
      on a.desc_01  like '%'||b.desc_02||'%';

I have tried with LEFT, STRPOS functions to improve the performance. But at the end am losing few data if i apply these functions.

Any other suggestion please.

I have no other key columns to match in this join. the table b is the master table which will have the list of code and description details. If the table a description matches with master table b then i need to get the description code from table b. — thiru8587, Oct 22 '19 at 09:14
Hi 404, i tried this and it's still having the same performance issue. — thiru8587, Oct 22 '19 at 12:28

score 0 · Answer 1 · answered Oct 22 '19 at 10:24

It's not that clear what your data (or structure) really looks like, but your search is performing a contains comparison. That's not the simplest thing to optimize because a standard index, and many matching algorithms, are biased towards the start of the string. When you lead with %, then then a B-tree can't be used efficiently as it splits/branches based on the front of the string.

Depending on how you really want to search, have you considered trigram indexes? they're pretty great. Your string gets split into three letter chunks, which overcomes a lot of the problems with left-anchored text comparison. The reason why is simple: now every character is the start of a short, left-anchored chunk. There are traditionally two methods of generating trigrams (n-grams), one with leading padding, one without. Postgres uses padding, which is the better default. I got help with a related question recently that may be relevant to you:

Searching on expression indexes

If you want something more like a keyword match, then full text search might be of help. I had not been using them much because I've got a data set where converting words to "lexemes" doesn't make sense. It turns out that you can tell the parser to use the "simple" dictionary instead, and that gets you a unique word list without any stemming transformations. Here's a recent question on that:

https://dba.stackexchange.com/questions/251177/postgres-full-text-search-on-words-not-lexemes/251185#251185

If that sounds more like what you need, you might also want to get rid of stop/skip/noise words. Here's a thread that I think is a bit clearer on the docs regarding how to set this up (it's not hard):

https://dba.stackexchange.com/questions/145016/finding-the-most-commonly-used-non-stop-words-in-a-column/186754#186754

Dear @morris de Oyrx, Thanks for your reply. I will go through the link shared. Also, The data in the table a will be like desc_01='Parts of automobiles' and in the table b [master table] will be having the data like code is: code01 and desc_02='autombiles' like wise the data will be compared dynamically and get the code01. Actually the table a will be having more than 1 million records and keep growing. — thiru8587, Oct 22 '19 at 12:25

score 0 · Answer 2 · answered Oct 22 '19 at 15:32

The long term answer is to clean up and re-organize your data so you don't need to do this.

Using a pg_trgm index might be the short term answer.

create extension pg_trgm;
create index on a using gin (desc_01 gin_trgm_ops);

How fast this will be is going to depend on what is in b.desc_02.

Pattern matching performance issue Postgres

2 Answers2