2

Background: -- Postgres 11.4 on RDS, 11.5 on macOS at home. -- The record_changes_log_detail table has about 8M in my test setup. -- The old_value field is of type citext. -- Values in the field range in length from 1 character to over 5,000. Most are short.

As far as I understand it, I need to use an expression index here as some of my values are too long for a B-tree entry. According to the Postgres 11 release notes:

"Allow creation of indexes that can be used by LIKE comparisons on citext columns (Alexey Chernyshov) To do this, the index must be created using the citext_pattern_ops operator class."

https://www.postgresql.org/docs/11/release-11.html

Here is such an index definition for my sample data:

CREATE INDEX record_changes_log_detail_old_value_ix_btree
    ON record_changes_log_detail 
    USING btree ((substring(old_value,1,1024)::citext) citext_pattern_ops);

If I run this query analysis, I can see that the index is used:

set max_parallel_workers_per_gather = 0; -- Don't tempt the planner to run a sequential scan.
explain analyze
select * from record_changes_log_detail where substring(old_value,1,1024)::citext = 'Gold Kerrison Neuro';

So, the = search works as hoped for. However, this query does not use the index:

select * from record_changes_log_detail where substring(old_value,1,1024)::citext LIKE 'Gold Kerrison Neuro%';

Is there some trick to getting citext indexes to work with LIKE queries that I've missed, or is this more likely a bug? For comparison, a LIKE query does use the index if you build it with text_pattern_ops. But then, of course, it's case-sensitive.

Case-sensitivity

Follow-up here to a comment on case-sensitivity. I think that `pg_trgm' is case-blind, but don't have time for a thorough review. As a quick check, these three comparisons each return 1, a perfect match.

select similarity('hello world','hello world');
select similarity('hello world','HELLO WORLD');
select similarity('Hello World','hello world');
Morris de Oryx
  • 1,857
  • 10
  • 28
  • I don't have an answer, but I suspect the behavior you are seeing has to do with that you are using a functional index, rather than something to do with `citext`. – Tim Biegeleisen Sep 19 '19 at 01:45
  • Also, perhaps you want to wait for an answer to [your previous question](https://stackoverflow.com/questions/58002565/searching-on-expression-indexes) before asking another very similar question. – Tim Biegeleisen Sep 19 '19 at 01:49
  • Fair comments. And, yes, I'm struggling a bit to ask questions that are short enough, but have enough detail...narrow and distinct enough, but not too overlapping. I'm new to SO and trying to do things appropriately. It's not going great ;-) – Morris de Oryx Sep 19 '19 at 03:03
  • Regarding the function part of this, I've updated my answer. I tested this all out a fair few ways, and the index *is* used for LIKE when you build with text ops instead of citext ops. – Morris de Oryx Sep 19 '19 at 03:08
  • One workaround would be to just maintain a lowercase column version of the `old_value` column. Then, just add a regular BTree index and `LIKE` should be able to use it. – Tim Biegeleisen Sep 19 '19 at 03:20
  • Erwin Brandstetter also suggested thinking about a shadow field populated with LEFT, hash text(), etc. I've been not wanting to do that, but something like LEFT(old_value,1024) would cover nearly 100% of the rows and make searches simple. Just search old_value_trimmed and no formula is required. -- Thanks. – Morris de Oryx Sep 19 '19 at 04:27

1 Answers1

4

I sent a message to the PG bugs mailing list on this subject and got back an answer from Tom Lane. My summary of the answer is "it's a documentation bug." The citext_pattern_ops does not support LIKE queries.

I figured I'd post the info here for the sake of the archives.

And, following up on a suggestion from Laurenz Albe, I gave the Postgres tri-gram implementation a try. They rule!

DROP INDEX IF EXISTS record_changes_log_detail_old_value_ix_tgrm;
CREATE INDEX record_changes_log_detail_old_value_ix_tgrm
    ON record_changes_log_detail 
    USING gin (old_value gin_trgm_ops);

The secret here when you're using citext is to cast your value to ::text, like so:

select * from record_changes_log_detail 
where old_value::text LIKE '%Gold Kerrison Neuro%';

Run that with explain analyze to confirm that the index is used.

Morris de Oryx
  • 1,857
  • 10
  • 28