0

I would like to use postgresql's unaccent contrib package in my tsvector column. I have successfully installed the package using create extension unaccent, and have checked that it works. I had set up a trigger to automatically update the tsvector column,

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON artists FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(tsv_name, 'pg_catalog.simple', name);

and then set the tsv_name column on all existing rows using the following

UPDATE artists SET tsv_name = (to_tsvector('simple', coalesce(name, '')));

I thought that I'd be able to incorporate the unaccent package by simply replacing name by unaccent(name) in the above code, but that's throwing a syntax error in the create trigger code. How can I fix this to ensure that the tsvector column uses unaccented characters?

David Harbage
  • 687
  • 3
  • 8
  • 25
  • 1
    You'll need to modify the trigger function or write a simple one just for this table yourself. If you take a look at it you'll see that it's not just using the column name as an expression. (It'd help if you showed the exact error message text, and the exact SQL you ran to get the error; do this with all questions and also include your PostgreSQL version). – Craig Ringer Jul 29 '14 at 21:48
  • Plus, obviously, the definition of trigger function. – Erwin Brandstetter Jul 30 '14 at 00:31

1 Answers1

3

I know this is a really old ticket, but along with @ErwinBrandstetter 's incredible answer (https://stackoverflow.com/a/11007216/5261969) and some PostgreSQL doc reading (https://www.postgresql.org/docs/9.3/functions-textsearch.html), I finally figured this out.

This is what my migration looks like (using Rails 5.2):


    # Erwin's wrapper code
    connection.execute(<<-EOSQL)
      CREATE OR REPLACE FUNCTION public.immutable_unaccent(regdictionary, text)
        RETURNS text LANGUAGE c IMMUTABLE STRICT AS
        '$libdir/unaccent', 'unaccent_dict';
    EOSQL

    connection.execute(<<-EOSQL)
      CREATE OR REPLACE FUNCTION public.f_unaccent(text)
        RETURNS text LANGUAGE sql IMMUTABLE STRICT AS
      $func$
      SELECT public.immutable_unaccent(regdictionary 'unaccent', $1)
      $func$;
    EOSQL

    # new trigger using f_unaccent
    connection.execute(<<-EOSQL)
      CREATE TRIGGER artists_name_tsvector_update BEFORE INSERT OR UPDATE
      OF name ON artists FOR EACH ROW EXECUTE PROCEDURE
      tsvector_update_trigger(
        tsv_name, 'public.f_unaccent(name)', name
      );

      UPDATE artists SET tsv_name = (to_tsvector('public.f_unaccent(name)', name));
    EOSQL

This update allows users to search with special characters or not to yield the same results (e.g "Ønders" and "Onders" search query both return the correct artist.)

Note: I am also using I18n.transliterate to localize the incoming query param.

Updated - upon running tests, I found that my UPDATE code was incorrect, so I had to alter my code to get it working. I'll keep the above in case it helps someone

Updated migration using @EvanCarroll 's answer https://stackoverflow.com/a/50595181/5261969 - I couldn't figure out how to get the update trigger to stop complaining using Erwin's code. :/

# create extension unaccent
    connection.execute(<<-EOSQL)
      CREATE TEXT SEARCH CONFIGURATION f_unaccent ( COPY = simple );
      ALTER TEXT SEARCH CONFIGURATION f_unaccent
        ALTER MAPPING FOR hword, hword_part, word
        WITH unaccent, simple;
    EOSQL

    # create updated trigger using unaccent for tsv_name
    # update existing tsv_names
    connection.execute(<<-EOSQL)
      CREATE TRIGGER artists_name_tsvector_update BEFORE INSERT OR UPDATE
      OF name ON artists FOR EACH ROW EXECUTE PROCEDURE
      tsvector_update_trigger(
        tsv_name, 'public.f_unaccent', name
      );

      UPDATE artists SET tsv_name = (to_tsvector('public.f_unaccent', name));
    EOSQL
lechin
  • 66
  • 6