3

How to store multilingual strings in postgresql?

This question isn't such as:

  1. Best way to store images in PostgreSQL [duplicate]
  2. How to design a database schema for storing text in multiple languages?
  3. Best practise for storing multilingual strings

I'm want insert strings with japanese, korean, chineses (simplified and traditional) and russian characters in one table.

INSERT INTO articles (title, text, date)
VALUES ('article title', 'Long text with chinese (龙), korean(울), japanese(形) and russian(ЯД) characters', '2017-02-16');

How to create Database in Postgresql with right LC_COLLATE and LC_CTYPE for multilingual strings in tables?

CREATE DATABASE "articles"
       WITH OWNER "postgres"
       ENCODING 'UTF8'
       LC_COLLATE = 'zh_CN.UTF-8'
       LC_CTYPE = 'zh_CN.UTF-8'
       TEMPLATE = template0;

OR

CREATE DATABASE "articles"
       WITH OWNER "postgres"
       ENCODING 'UTF8'
       LC_COLLATE = 'zh_TW.UTF-8'
       LC_CTYPE = 'zh_TW.UTF-8'
       TEMPLATE = template0;

OR

CREATE DATABASE "articles"
       WITH OWNER "postgres"
       ENCODING 'UTF8'
       LC_COLLATE = 'ko_KR.UTF-8'
       LC_CTYPE = 'ko_KR.UTF-8'
       TEMPLATE = template0;

OR

CREATE DATABASE "articles"
       WITH OWNER "postgres"
       ENCODING 'UTF8'
       LC_COLLATE = 'ja_JP.UTF-8'
       LC_CTYPE = 'ja_JP.UTF-8'
       TEMPLATE = template0;

OR

CREATE DATABASE "articles"
       WITH OWNER "postgres"
       ENCODING 'UTF8'
       LC_COLLATE = 'ru_RU.UTF-8'
       LC_CTYPE = 'ru_RU.UTF-8'
       TEMPLATE = template0;

What is best practice for store multilingual text in postgresql?

Community
  • 1
  • 1
  • 2
    That depends on what your goals are. You can simply *store* data in any language without worrying about the collate and ctype settings at all. Those are only relevant if you expect very specific behaviour in terms of comparisons and sorting. – deceze Feb 16 '17 at 11:14
  • 1
    You can define the collation on a per column level as well. –  Feb 16 '17 at 11:34
  • @deceze Thank You. That is all I really need to know. – Roman Pesterev Feb 17 '17 at 01:30
  • @deceze, but [here](https://www.postgresql.org/docs/current/static/collation.html) they write that collation influences sorting, comparison and other DB functions as well. Would the program having Postgres as a DB-backend misbehave on a wrong location? E.g. if we specify `en_US` collation for japan-korean-chinese database? – Suncatcher May 20 '18 at 10:01
  • @Suncatcher Again, it depends on what you want. Collation is only relevant for collating actions, like sorting and comparison. For an English user, you might want to sort CJK text at the end, for CJK users the opposite. You can decide that on a per-query basis. – deceze May 20 '18 at 10:33
  • `it depends on what you want`, let's take following use-case: [app](https://en.wikipedia.org/wiki/Gramps) uses Postgres as backend and speaks to it through [psycopg2](http://initd.org/psycopg/) API. Will it process calls to Postgres incorrectly if the database is multilingual (4+ languages including non-Latin characters) and the database uses `en_US` collation? – Suncatcher May 20 '18 at 10:41
  • @Suncatcher What do you mean by “incorrectly”? The data may be *sorted* or *compared* differently depending on the set collation. Which is **correct** depends on what you expect. – deceze May 20 '18 at 11:11
  • Incorrectly is when I select ascending order and it shows descending, or shows `B` before `A` – Suncatcher May 20 '18 at 11:17
  • @Suncatcher Well, it won’t randomly sort in reverse or anything like that. But again, it will sort CJK before English or vice versa depending on your set collation. That wouldn’t be *incorrect*, it would just not be as you desire and you need to change the collation in your query then. – deceze May 20 '18 at 11:43

1 Answers1

-1

To store multiple languages in one table i used the below code and its working

CREATE TABLE TESTING1(COL VARCHAR(100))

INSERT INTO TESTING1 (COL)
SELECT 'ЯНВАРЬ';

Here, insert the russian text you can insert any language data like japenese, koren, telgu, etc.

please refer the below link for more information

cursorrux
  • 1,382
  • 4
  • 9
  • 20