16

I am trying to index documents to be searchable on their tag array.

CREATE INDEX doc_search_idx ON documents
      USING gin( 
    to_tsvector('english', array_to_string(tags, ' ')) ||
    to_tsvector('english', coalesce(notes, '')))
)

Where tags is a (ci)text[]. However, PG will refuse to index array_to_string because it is not always immutable.

PG::InvalidObjectDefinition: ERROR:  functions in index expression must be marked IMMUTABLE

I've tried creating a homebrew array_to_string immutable function, but I feel like playing with fire as I don't know what I'm doing. Any way not to re-implement it?

Looks like I could just repackage the same function and label it immutable, but looks like there are risks when doing that.

How do I index the array for full-text search?

Community
  • 1
  • 1
Jonathan Allard
  • 18,429
  • 11
  • 54
  • 75

2 Answers2

14

In my initial answer I suggested a plain cast to text: tags::text. However, while most casts to text from basic types are defined IMMUTABLE, this it is not the case for array types. Obviously because (quoting Tom Lane in a post to pgsql-general):

Because it's implemented via array_out/array_in rather than any more direct method, and those are marked stable because they potentially invoke non-immutable element I/O functions.

Bold emphasis mine.

We can work with that. The general case cannot be marked as IMMUTABLE. But for the case at hand (cast citext[] or text[] to text) we can safely assume immutability. Create a simple IMMUTABLE SQL function that wraps the function. However, the appeal of my simple solution is mostly gone now. You might as well wrap array_to_string() (like you already pondered) for which similar considerations apply.

For citext[] (create separate functions for text[] if needed):

Either (based on a plain cast to text):

CREATE OR REPLACE FUNCTION f_ciarr2text(citext[]) 
  RETURNS text LANGUAGE sql IMMUTABLE AS 'SELECT $1::text';

This is faster.
Or (using array_to_string() for a result without curly braces):

CREATE OR REPLACE FUNCTION f_ciarr2text(citext[]) 
  RETURNS text LANGUAGE sql IMMUTABLE AS $$SELECT array_to_string($1, ',')$$;

This is a bit more correct.
Then:

CREATE INDEX doc_search_idx ON documents USING gin (
   to_tsvector('english', COALESCE(f_ciarr2text(tags), '')
                || ' ' || COALESCE(notes,'')));

I did not use the polymorphic type ANYARRAY like in your answer, because I know text[] or citext[] are safe, but I can't vouch for all other array types.

Tested in Postgres 9.4 and works for me.

I added a space between the two strings to avoid false positive matches across the concatenated strings. There is an example in the manual.

If you sometimes want to search just tags or just notes, consider a multicolumn index instead:

CREATE INDEX doc_search_idx ON documents USING gin (
             to_tsvector('english', COALESCE(f_ciarr2text(tags), '')
          ,  to_tsvector('english', COALESCE(notes,''));

The risks you are referring to apply to temporal functions mostly, which are used in the referenced question. If time zones (or just the type timestamptz) are involved, results are not actually immutable. We do not lie about immutability here. Our functions are actually IMMUTABLE. Postgres just can't tell from the general implementation it uses.

Related

Often people think they need text search, while similarity search with trigram indexes would be a better fit:

Not relevant in this exact case, but while working with citext, consider this:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Actually, I've just had the chance to try this, and looks like PG interprets the `::text` as not immutable. `ERROR: functions in index expression must be marked IMMUTABLE : CREATE INDEX doc_search_idx ON documents USING gin( to_tsvector('english', coalesce(tag_cache::text, '')) )`. Have you tried it? – Jonathan Allard Jul 23 '15 at 20:10
  • I can create the index just fine, but I find that for some reason Postgres won't use it. EXPLAIN ANALYZE always shows a seq scan. The index works fine if I leave out the array column. – stickfigure Mar 04 '22 at 21:41
  • @stickfigure: Ask a new question with specifics if you can't figure it out. Comments are not the place. You can always link to this one for context and drop a comment here to link forward. – Erwin Brandstetter Mar 04 '22 at 22:04
  • This "non-immutable element I/O functions" sounds mysterious. What's an example of how `array_to_string` would have non-immutable behavior between one query and another on some array type? – Andy Feb 07 '23 at 19:09
  • @Andy: The text representation of some data types depends on session settings. Like `timestamptz`. Example: https://dbfiddle.uk/eDJekbv8. Related: https://stackoverflow.com/a/54384767/939860 – Erwin Brandstetter Feb 07 '23 at 22:52
  • @ErwinBrandstetter I see, so I guess Postgres function overloading currently doesn't make it possible to declare overloads IMMUTABLE for applicable element types? – Andy Feb 07 '23 at 23:26
  • @Andy: That's possible. But `array_to_string()` is a single polymorphic function. (Well, there is a 2nd variant with 3 args.) You can create such overloaded functions, even with `LANGUAGE internal` or `LANGUAGE c`, but you need superuser privileges, and it gets confusing and error-prone quickly. I would go with distinct function names. Examples: https://stackoverflow.com/a/54384767/939860, https://stackoverflow.com/a/11007216/939860 For anything more, please start a new question. – Erwin Brandstetter Feb 08 '23 at 00:01
2

Here's my naive solution, to wrap it and call it immutable, as suspected.

  CREATE FUNCTION immutable_array_to_string(arr ANYARRAY, sep TEXT)
    RETURNS text
    AS $$
      SELECT array_to_string(arr, sep);
    $$
    LANGUAGE SQL
    IMMUTABLE
  ;
Jonathan Allard
  • 18,429
  • 11
  • 54
  • 75