0

I have an inner join on regular expressions - it is very slow. Is there any easy way to speed this up? I am using postgres.

FROM A
inner join B ON trim(lower(replace(replace(replace(B.enginequery,',',' '),'"',' '),'+',' '))) = trim(lower(A.keyphrase))
             OR trim(lower(replace(replace(replace(B.enginequery,',',' '),'"',' '),'+',' '))) ~ (trim(lower(A.keyphrase)) || '$')
             OR trim(lower(replace(replace(replace(B.enginequery,',',' '),'"',' '),'+',' '))) ~ (trim(lower(A.keyphrase)) || ' ')
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
alex
  • 207
  • 1
  • 4
  • 17
  • Alex: Did you get the answer you were looking for? If so, could you please accept it? If not, could you clarify what you are still looking for? Usually, the more information you provide, the more likely it is someone can help you. – Jeff Maass Jul 01 '10 at 00:33

2 Answers2

3

Is there any easy way to speed this up?

The reason performance suffers is all the operations, let alone the regex, that have to be performed just to make a match. You need to simplify the relationship so these don't need to be performed.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
  • It sure does look like your schema needs re-thinking. Joining on expressions like this is always going to be slow. You could (possibly, I don't know your full schema) side-step the problem with a derived table to limit the work done, but it's not the solution. – d11wtq Jun 05 '10 at 00:43
  • 1
    A functional index *could* help, but you really need to rethink that design that requires that. – rfusca Jun 05 '10 at 05:40
1

I would start by placing the results of :

 trim(lower(replace(replace(replace(B.enginequery,',',' '),'"',' '),'+',' ')))

into a column in your table. At least then one wouldn't have to repeatedly calculate it. How you implement that in postgres I don't know. In Ms sql server, I would try a calculated column so that my apps wouldn't have to know about updating B.enginequery and its cleaned version.

And then, I would probably end up attempting an index on that cleaned up column.

Jeff Maass
  • 3,632
  • 3
  • 26
  • 30