2

I've been having some problems trying to save a string word with limited varchar(9).

create database big_text
    LOCALE 'en_US.utf8'
    ENCODING UTF8
    
    
create table big_text(
    description VARCHAR(9) not null
)

# OK
insert into big_text (description) values ('sintético')

# I Got error here
insert into big_text (description) values ('sintético')

I already know that the problem is because one word is using 'é' -> Latin small letter E with Acute (this case only have 1 codepoint) and another word is using 'é' -> Latin Small Letter E + Combining Acute Accent Modifier. (this case I have 2 codepoint).

How can I store the same word using both representation in a limited varchar(9)? There is some configuration that the database is able to understand both ways? I thought that database being UTF8 is enough but not.

I appreciate any explanation that could help me understand where am I wrong? Thank you!

edit: Actually I would like to know if there is any way for postgres automatically normalize for me.

1 Answers1

1

A possible workaround using CHECK to do the character length constraint.

show lc_ctype;

  lc_ctype   
-------------
 en_US.UTF-8

create table big_text(
    description VARCHAR not null CHECK (length(normalize(description)) <= 9) 
)

-- Note shortened string. Explanation below.
select 'sintético'::varchar(9);
 varchar  
----------
 sintétic

insert into big_text values ('sintético');
INSERT 0 1

select description, length(description) from big_text;
 description | length 
-------------+--------
 sintético   |     10

insert into big_text values ('sintético test');
ERROR:  new row for relation "big_text" violates check constraint "big_text_description_check"
DETAIL:  Failing row contains (sintético test).


From here Character type the explanation for the string truncation vs the error you got when inserting:

An attempt to store a longer string into a column of these types will result in an error, unless the excess characters are all spaces, in which case the string will be truncated to the maximum length.(This somewhat bizarre exception is required by the SQL standard.)

If one explicitly casts a value to character varying(n) or character(n), then an over-length value will be truncated to n characters without raising an error. (This too is required by the SQL standard.)

Adrian Klaver
  • 15,886
  • 2
  • 17
  • 28
  • Thank you for the answer! Do you know if there is any way for postgres automatically normalize for me when I insert a value? – allan.egidio Sep 28 '22 at 17:18
  • Not that I know of, probably because 1) There are four forms(NFC (the default), NFD, NFKC, or NFKD) that can be used. 2) It only works with UTF8. – Adrian Klaver Sep 28 '22 at 21:11