2

Consider:

create table tab (foo text not null);
create index tab_ix_foo on tab(foo);

select * from tab where foo like 'pre%';

Postgres doesn't use the index to do that search. When using the collation "POSIX", Postgres uses the index: http://sqlfiddle.com/#!12/ed1cc/1

When using the collation "en_US", Postgres uses a sequential scan: http://sqlfiddle.com/#!12/efb18/1

Why the difference?

jameshfisher
  • 34,029
  • 31
  • 121
  • 167
  • 1
    Given that this is an empty table, could you provide anything that suggests that such an index is or is not used with a table when there are many records? The example is entirely useless regarding index issues because there is no plan that beats a sequential scan through a single page. – Chris Travers Mar 24 '13 at 11:07
  • @ChrisTravers Good point. eegg, please post `EXPLAIN ANALYZE` of the *real* problem queries. – Craig Ringer Mar 24 '13 at 11:34
  • Fiddles updated to insert data before running the `EXPLAIN`. – jameshfisher Mar 24 '13 at 11:48

1 Answers1

4

When using locales other than C (ie POSIX) you need to create your indexes for LIKE and ~ prefix text searches using the text_pattern_ops opclass. See the docs on Operator classes and index types. I'm sure there's a better reference than that docs page but I can't seem to find it at the moment.

If I alter your SQLFiddle to use text_pattern_ops for the en_US index you'll see that it's able to use the index:

create index tab_ix_foo on tab using btree (foo collate "en_US" text_pattern_ops);
--                                                              ^^^^^^^^^^^^^^^^

It is quite likely that you'll need to create different indexes for different collations if you're using the COLLATE option in 9.2+, since pretty much by definition different collations imply different orderings of strings and therefore different b-tree organisation. It appears that you are already doing this in your tests.

It's also possible that your data is just too small for index use to be particularly useful. Try testing with a more useful amount of data.

This post may be useful, as may the docs on Collation support.


For why you can't just use the same b-tree index for different collations, consider that b-trees require a stable and consistent ordering, but:

regress=> SELECT ' Bill''s' > ('bills' COLLATE "POSIX");
 ?column? 
----------
 f
(1 row)

regress=> SELECT ' Bill''s' > ('bills' COLLATE "en_US");
 ?column? 
----------
 t
(1 row)

As you can see, the collation changes the sort ordering. That's what a collation does, pretty much by definition. Trying to use the same index for different collations is like trying to use the same functional index for different functions; it makes absolutely no sense.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • This post suggests that the index cannot be on the entire column: http://stackoverflow.com/questions/9213816/speeding-up-wildcard-text-lookups – Tim Mar 24 '13 at 11:20
  • @Tim There's no need to constrain the length or index on an expression, but there *are* circumstances in which that can be useful - it helps limit index size and can sometimes permit faster queries in exchange for the need for fiddlier queries, since your query has to use *exactly* the same expression as the index. – Craig Ringer Mar 24 '13 at 11:24
  • 1
    I don't feel like I should need anything other than a btree index in order to do prefix searching. I don't think the collation should be relevant to prefix searching -- assuming all collations define an ordering relation that groups prefixes together (possibly this assumption is my incorrect one). – jameshfisher Mar 24 '13 at 11:59
  • @eegg b-trees rely on stable ordering; using a b-tree created for `C` collation in `en_US` collation would break that ordering, since some sequences that used to be before others would instead be later. – Craig Ringer Mar 24 '13 at 12:21