Best practise for storing multilingual strings

Question

I need to store different versions of not very long strings for different languages (2-4 languages) in a Postgres table.

What is the best way of doing that? Array or JSON or something like that?

Erwin Brandstetter · Accepted Answer · 2023-02-11T01:09:55.147

18

First make sure that the database locale can deal with different languages. Use a UTF-8 server-encoding. Optionally, set LC_COLLATE = 'C' to be on neutral ground, or use the collation of your main language to have a default sort order. Start by reading the chapter Collation Support in the manual.

I would strongly suggest to use the latest version of Postgres (9.1 at time of writing) for superior collation support.

As for the table structure: keep it simple. It sounds like there is a low, fixed number of languages to deal with. You could just have a column for each language then:

CREATE TABLE txt (
  txt_id serial PRIMARY KEY
, txt    text NOT NULL -- master language NOT NULL?
, txt_fr text -- others can be NULL?
, txt_es text
, txt_de text
);

This is pretty efficient, even with many languages. NULL storage is very cheap.
If you have a varying number of languages to deal with, or many updates for individual language strings, a separate table might be the better solution. This solution assumes that you have a "primary language", where the string is always present:

CREATE TABLE txt (
  txt_id serial PRIMARY KEY
, txt    text NOT NULL -- master language NOT NULL?
);

CREATE TABLE lang (
  lang_abbr text PRIMARY KEY -- de, es, fr, ...
, lang      text NOT NULL
, note      text
);

CREATE TABLE txt_trans (
  txt_id    int  REFERENCES txt(txt_id) ON UPDATE CASCADE ON DELETE CASCADE
, lang_abbr text REFERENCES lang(lang_abbr) ON UPDATE CASCADE
, txt       text NOT NULL -- master language NOT NULL?
, CONSTRAINT txt_trans_pkey PRIMARY KEY (txt_id, lang_abbr)
);

Not treating the master language special and keeping all language variants in the same table might make handling in your app simpler. But it really depends on requirements.

edited Feb 11 '23 at 01:09

answered Mar 27 '12 at 13:13

Erwin Brandstetter

605,456
145
1,078
1,228

1

What do you think about using the JSON data type to store the additional values, keyed by locale code? – Jeremy Baker Oct 02 '16 at 21:47
@JeremyBaker: The data type `json` is a good solution for a big pool of possibly varying attributes. Not so much for a hand full of well known attributes (like in this example). It depends on the complete picture. When this question was asked, Postgres 9.2 had very basic json support. Things have improved a lot since, not least by adding `jsonb` ... – Erwin Brandstetter Oct 03 '16 at 01:58
5

One thing I wish you'd done was touch on the possibility of using JSONB fields to store multiple languages `{"en_US": "hello", "fr": "bonjour"}` – Soviut Aug 21 '18 at 06:33
2

2nd case looks good, but what we gonna do if where are many "txts": catalogue, products, tags, so on... – Serge Jan 30 '20 at 08:16
I store all languages mutations in JSONB collumn like in Soviut answer e.g. `{"en": "hello". "de": "Hallo", "cs": Ahoj}`, but ran into problem of quering this collumn. I have a case where I need to find a prefix match in any of the languages and do it efficiently. So far I got only something like this `....WHERE title '$.* ? (@ like_regex "^:query.*") ` – Saintan Jan 17 '21 at 22:43
@Saintan we are considering storing the data like you did and searching the data on Elasticsearch, we'll never touch the RDBMS for searching purposes. – Udlei Nati Mar 30 '22 at 17:04
Would you do it like this today or do you have discovered a better solution? – Stefan Falk Mar 24 '23 at 10:44

Best practise for storing multilingual strings

1 Answers1

Linked