4

I appreciate that LIKE queries are slow as they cannot be indexed. However, I am curious about the performance hit in a situation like this:

Say I have a table like:

user_id  |  message 
-------------------
   1     |  foo bar baz
   1     |  bar buz qux
   .     .      .
   .     .      .
   2     |  bux bar foo
   2     |  bar

where I have say 1 million rows, but 10,000 users, so each user has about 100 messages.

Clearly a search like:

SELECT * FROM table WHERE message like '%ar%';

is going to be very slow. However in my application I would only ever search a user's messages:

SELECT * FROM table WHERE message like '%ar%' AND user_id = 2;

where the user_id column would be indexed.

Am I right in thinking that in a scenario like this, Postgres would only ever perform the slow LIKE query on the users ~100 rows, after using the indexed user_id column, rather than the full table - thus limiting my performance hit?

And also that a query like this wouldn't get significantly slower with 10 or 100 million users, as long as any one user only had ~100 messages?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
latentflip
  • 1,843
  • 3
  • 19
  • 22
  • When you know the id, why bother for the `LIKE`? I think the order is important, so you should first look for the user_id then for the message `user_id = 2 AND message LIKE '%ar%'` – DrColossos Mar 22 '12 at 10:38
  • Swapping the conditions won't make any difference. Postgres is smart enough for that. –  Mar 22 '12 at 10:40
  • @DrColossos - SQL is converted into a plan, and then executed. That plan is not a simplistic run through of the sql in the order you wrote it. Generation of the plan is far more intelligent than that. It depends on indexes, statistics, but not the order of those two conditions. Try the queries, with and without indexes, written in different orders, etc, and look at the plans. It's very illuminating. ***[Also, the `LIKE` filter has nothign to do with the user filter. The OP is looking for specific messages with the set of messages for that single user.]*** – MatBailie Mar 22 '12 at 10:47

2 Answers2

8

MatBailie already cleared your primary question. I want to address an assertion of yours:

I appreciate that LIKE queries are slow as they cannot be indexed.

This is not entirely true.

Firstly, and this has been true for a long time, left anchored patterns can use an index. This works for regular expressions (~) as well as LIKE (~~) and SIMILAR TO. I recently wrote a comprehensive review on the matter on dba.SE:

This may not work for you, because the patterns in your question are not anchored. If they were you could get optimized performance with a multicolumn index that uses the text pattern operator class text_pattern_ops for the message column like this:

CREATE INDEX tbl_user_id_message_idx ON tbl (user_id, message text_pattern_ops);

For queries like:

SELECT *
FROM   tbl
WHERE  user_id = 2
AND    message ~~ 'bar%'; -- left anchored LIKE

Secondly, since PostgreSQL 9.1 you can use the pg_trgm extension and create a GIST or GIN index with it, that all patterns can use. Some limitations apply. Maintenance of such an index is more expensive, so it is most useful for read-only or rarely written tables. Details:

Depesz has a related tutorial.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for that. I came across the fact that you could index like this before - and thought it would be perfect. Alas, I learned that anchored means at the start of the whole field, not just the start of any of the words which I thought it might be. – latentflip Mar 22 '12 at 14:58
  • While postgres happily uses indexes for left anchored regexes and like patterns, it can only do so when they are presented in literal form. If the pattern value is a placeholder in a prepared statement or sproc, the statement gets planned at prepare time and the planner has no visibility into the pattern's anchoring, so can't optimistically plan to use an index. – dbenhur Mar 22 '12 at 15:22
  • @latentflip: I may have a treat for you, but no time right now. :) – Erwin Brandstetter Mar 22 '12 at 17:53
  • @ErwinBrandstetter way to tease! – latentflip Mar 22 '12 at 19:54
3

The optimiser determines many things when compiling SQL into a plan.

One of them is how to filter data (with index seeks, etc) before applying other conditions on a row by row basis.


In your case, provided you have a suitable index the LIKE will only be applied to the records after that filtering is done.


To understand a bit more about it, get the plan that is created by your query. You should be able to see where indexes are used to sub-set/filter the data, and then a separate step applying the LIKE condition.

MatBailie
  • 83,401
  • 18
  • 103
  • 137