2

The UNACCENT function can strip diacritics off characters. However, in my case, it can only strip characters with 1 diacritic, for e.g.

  • Thành
  • Supermän
  • äää

For characters with more than 1 diacritics, UNACCENT does nothing, for e.g.

  • Hồ
  • phố

Is there a way to let Postgres strip the accents from these characters?

Thanks

Khanetor
  • 11,595
  • 8
  • 40
  • 76

3 Answers3

4

PostgreSQL's unaccent module does not use Unicode normalization, but only a simple search-and-replace dictionary. The default dictionary, unaccent.rules, does not contain these Vietnamese characters, thus nothing is done.

You could create your own unaccent dictionary though. As explained in the documentation:

  1. Create a text file vietnamese.rules with content like

    ầ  a
    Ầ  A
    ồ  o
    Ồ  O
    
  2. Move vietnamese.rules into the folder $SHAREDIR/tsearch_data/ (usually /usr/share/postgresql/tsearch_data)

  3. Run the function as

    SELECT unaccent('vietnamese', 'Hồ ầ phố');
    --              ^~~~~~~~~~~~~
    
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • I was afraid to create my own rule since I don't have access to the DB computer, but I guess there is no avoiding it. – Khanetor Apr 08 '17 at 12:23
  • I'm sorry, but I got this error ` ERROR: text search dictionary "vietnamese" does not exist LINE 1: SELECT unaccent('vietnamese', 'Hồ ầ phố'); ` – Hom nom nom nom ... Jan 21 '19 at 13:23
  • @Homnomnomnom... Just submitted an edit addressing your problem. You need to run the appropriate `CREATE TEXT SEARCH DICTIONARY` command. – cstork Nov 13 '21 at 14:30
3

You can create a new function to do unaccent job like below:

CREATE OR REPLACE FUNCTION public.vietnamese_unaccent(text)
 RETURNS text
 LANGUAGE plpgsql
AS $function$
DECLARE
    input_string text := $1;
BEGIN

input_string := translate(input_string, 'áàãạảAÁÀÃẠẢăắằẵặẳĂẮẰẴẶẲâầấẫậẩÂẤẦẪẬẨ', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
input_string := translate(input_string, 'éèẽẹẻEÉÈẼẸẺêếềễệểÊẾỀỄỆỂ', 'eeeeeeeeeeeeeeeeeeeeeeee');
input_string := translate(input_string, 'íìĩịỉIÍÌĨỊỈ', 'iiiiiiiiiii');
input_string := translate(input_string, 'óòõọỏOÓÒÕỌỎôốồỗộổÔỐỒỖỘỔơớờỡợởƠỚỜỠỢỞ', 'ooooooooooooooooooooooooooooooooooo');
input_string := translate(input_string, 'úùũụủUÚÙŨỤỦưứừữựửƯỨỪỮỰỬ', 'uuuuuuuuuuuuuuuuuuuuuuu');
input_string := translate(input_string, 'ýỳỹỵỷYÝỲỸỴỶ', 'yyyyyyyyyyy');
input_string := translate(input_string, 'dđĐD', 'dddd');

return input_string;
END;
$function$

It's works for me!

Nanuno
  • 31
  • 2
0

If you use from version 13 onwards, the Postgresql already support that function here:

select normalize('hồ, phố, ầ', NFC) → 'ho, pho, a' -- NFC (the default), NFD, NFKC, or NFKD.

Document: https://www.postgresql.org/docs/13/functions-string.html

T PH KH
  • 21
  • 1
  • 5