3

Consider the following scenario where I have the string defined by \U00000045\U00000301.

1) https://www.fileformat.info/info/unicode/char/0045/index.htm
2) https://www.fileformat.info/info/unicode/char/0301/index.htm

Would a table constrained by varchar(1) treat it as a valid 1 character input. Or would it be rejected because it is considered a 2 character input?

How does SQL treat the length of strings with graphemes in them generally?

AlanSTACK
  • 5,525
  • 3
  • 40
  • 99
  • https://stackoverflow.com/questions/4249745/does-postgresql-varchar-count-using-unicode-character-length-or-ascii-character – user2864740 Jan 13 '18 at 02:09
  • @user2864740 that post doesnt talk about graphemes. Just unicode code points. since graphemes are multiple unicode code points that semantically need to be interpreted as a single graphical unicode character - my question still remains unanswered. – AlanSTACK Jan 13 '18 at 02:38
  • It has code that demonstrates *how* to answer the question. It also mentions different database character encodings, which may be something relevant in context. – user2864740 Jan 13 '18 at 02:38

1 Answers1

1

I probably look silly with this query, but still:

t=# with c(u) as (values( e'\U00000045\U00000301'))
select u, u::varchar(1), u::varchar(2),char_length(u), octet_length(u) from c;
 u | u | u | char_length | octet_length
---+---+---+-------------+--------------
 É | E | É |           2 |            3
(1 row)

edit

t=# show server_encoding ;
 server_encoding
-----------------
 UTF8
(1 row)

t=# \l+ t
                                        List of databases
 Name | Owner | Encoding | Collate | Ctype | Access privileges | Size  | Tablespace | Description
------+-------+----------+---------+-------+-------------------+-------+------------+-------------
 t    | vao   | UTF8     | C       | UTF-8 |                   | 51 MB | pg_default |
(1 row)
Vao Tsun
  • 47,234
  • 13
  • 100
  • 132