Gettext: Is it a good idea for the message ID to be the english text?

Question

We're getting ready to translate our PHP website into various languages, and the gettext support in PHP looks like the way to go.

All the tutorials I see recommend using the english text as the message ID, i.e.

gettext("Hi there!")

But is that really a good idea? Let's say someone in marketing wants to change the text to "Hi there, y'all!". Then don't you have to update all the language files because that string -- which is actually the message ID -- has changed?

Is it better to have some kind of generic ID, like "hello.message", and an english translations file?

There is a similar question: http://stackoverflow.com/questions/4232922/why-do-people-use-plain-english-as-translation-placeholders — sleske, Nov 28 '11 at 09:27

score 49 · Answer 1 · answered Feb 23 '09 at 21:59

49

Wow, I'm surprised that no one is advocating using the English as a key. I used this style in a couple of software projects, and IMHO it worked out pretty well. The code readability is great, and if you change an English string it becomes obvious that the message needs to be considered for re-translation (which is a good thing).

In the case that you're only correcting spelling or making some other change that definitely doesn't require translation, it's a simple matter to update the IDs for that string in the resource files.

That said, I'm currently evaluating whether or not to carry this way of doing I18N forward to a new project, so it's good to hear some thoughts on why it might not be a good idea.

answered Feb 23 '09 at 21:59

dcstraw

3,243
3
29
38

8

"In the case that you're only correcting spelling or making some other change that definitely doesn't require translation, it's a simple matter to update the IDs for that string in the resource files." Unfortunately, that's not always true. If the string is used in two places, and only one changes, you also need to duplicate the entry in the .po files. If the string is multiline, this change will require serious scripting. – sleske Nov 28 '11 at 09:38
1

I would also go with using the base language strings as the msgIds. On large dictionaries there is a performance hit when you also have to lookup the strings for the default language (when you don't have to). – Patrick Forget Sep 10 '13 at 18:32

score 23 · Answer 2 · answered Feb 09 '09 at 20:54

I strongly disagree with Richard Harrisons answer about which he states it is "the only way". Dear asker, do not trust an answer that states it is the only way, because the "only way" doesn't exist.

Here is another way which IMHO has a few advantages over Richards approach:

Start with using the proto-version of the English string as Original.
Don't display these proto-strings but create a translation file for English nontheless
Copy the proto-strings to the translation for the beginning

Advantages:

readable code
text in your code is very close if not identical to what your view displays
if you want to change the English text, you don't change the proto-string but the translation
if you want to translate the same thing twice, just write a slightly different proto-string or just add 'version for this and that' and you still have a perfectly readable code

I think this should be the accepted answer. The key point here is that even if you have English in the source code, you still create `en_UK` and `en_US` or whatever English variants you really want to support. Weblate calls the source code version as "Developer English" which is a pretty good solution. I'd also recommend using `npgettext()` because in some cases the same English word can be used as verb or noun and different languages may not be similar. The Developer English version should have "Go to modify view", "Modifying the content", and "Save modifications" and en_US have 3 x "Modify". — Mikko Rantalainen, Jun 08 '21 at 08:24

score 22 · Accepted Answer · answered Oct 19 '08 at 19:00

22

I use meaningful IDs such as "welcome_back_1" which would be "welcome back, %1" etc. I always have English as my "base" language so in the worst case scenario when a specific language doesn't have a message ID, I fall-back on English.

I don't like to use actual English phrases as message ID's because if the English changes so does the ID. This might not affect you much if you use some automated tools, but it bothers me. I don't like to use simple codes (like msg3975) because they don't mean anything, so reading the code is more difficult unless you litter comments everywhere.

answered Oct 19 '08 at 19:00

chroder

4,393
2
27
42

5

But what about PO files. They say that `msgid` should be the untranslated string. Lets say we deviate and put the short keywords instead of English messages. The problem now arises that how would the translators get to know the actual english strings? because now the PO files have only the keys and not the actual messages. – Kushagra Gour Apr 17 '13 at 12:12
2

"Kushagra Gour" - this is not a problem, you already have the source text, so just put it to the corresponding PO file (say en_US.po, if your native source text is in english), along with structured keys. Here the structuted key is the msgid and msgstr holds the native source text. Then just let translators to make proper PO files according to their language. Ttranslators would just copy the msgid/msgstr strings and replace the msgstr with translated text. The problem with strucutred keys and translators not beeing able to read them is freq. repeated, but in fact is non-existent. – Tomas Bilka May 12 '14 at 22:10
The reason for using english phrases, not keywords, as msgid is, that normally translators are no developers. They don't know or understand the source text. This is also the reason to use whole phrases, as words can change meaning with context. The PO file is meant to be translated as is by person (or algorithm) that is fluent in english and the destination languate. Also: plural forms can only be supportet with real source phrases. The gettext system is much more than just mapping keys to text. It's about the lifecycle of an application. – Martin M Jan 29 '21 at 10:15

score 12 · Answer 4 · answered Oct 19 '08 at 15:44

The reason for the IDs being English is so that the ID is returned if the translation fails for whatever reason - the translation for the current language and token not being available, or other errors. That of course assumes the developer is writing the original English text, not some documentation person.

Also if the English text changes then probably the other translations need to be updated?

In practice we also use Pure IDs rather than then English text, but it does mean we have to do lots of extra work to default to English.

+1 [Look](http://stackoverflow.com/questions/2790952/php-localization-best-practices-gettext?rq=1) at ZZ coder's answer :) — CoR, May 15 '13 at 12:11

score 10 · Answer 5 · answered Oct 16 '17 at 19:59

There is a lot to consider and answer is not so easy.

Using plain English

Pros

Easy to write and READ code
In most cases, it works even without running translation functions in code

Cons

Involved programmers must be also good copywriters :)
You need to write correct precise texts fully in English, even in the case that first language you need to run is something else (ie we're starting lof of projects in Czech language and we're localizing them to EN later).
In a lot of cases, you need to use contexts. If you fail to do it from begginig, it's a lot of work to add them later. To explain: In English, one word can have many different meands - and you need to use contexts to differentiate them - and it's not always so easy (order = sort order, or it can be purchase order).
It can be very hard to correct English later in the process. Corrections of the source strings will very often lead to loss of already translated phrases. It's very frustrating to loose translation to 3 different languages just because you corrected English.

Using keys

Pros

You can use localization platform functions even for the English language. I.e. we're using the lovely Crowdin platform. There is a lot of handy tools - or rather a complete workflow - for translation management: voting for different translations, translation history, glossaries (which helps to keep translation/language coherent), proofing, approval, etc. Using keys make this process much more smooth.
It's much easier to send Engish texts for proofreading etc. Usually, it's not a good idea to let copywriters to modify your code directly :)

Cons

More complicated project setup.
Harder to use %d, %s etc.

score 6 · Answer 6 · answered Jul 31 '09 at 00:22

6

In a word don't do this.

The same word/phrase in English can often enough have more than one meaning, and each meaning a different translation.

Define mnemonic ids for your strings,and treat English as just another language.

Agree with other posters that id numbers in code are a nightmare for code readability.

Ex localisation engineer

answered Jul 31 '09 at 00:22

2

But what about PO files. They say that `msgid` should be the untranslated string. Lets say we deviate and put the short keywords instead of English messages. The problem now arises that how would the translators get to know the actual english strings? because now the PO files have only the keys and not the actual messages. – Kushagra Gour Apr 17 '13 at 12:22

score 4 · Answer 7 · answered Oct 19 '08 at 14:36

4

Haven't you already answered your own question? :)

Clearly, if you intend to support i18n of your application, you should treat all the language implementations the same. If someone decides a string needs to change, you make a similar change in all the language files. The metadata with the checkin should group all the language files together in the same change. If your "default" language is handled differently, that makes it more difficult to maintain.

answered Oct 19 '08 at 14:36

David M. Karr

14,317
20
94
199

2

That's assuming you have a German, Japanese, Chinese, Arabic, etc speaker ready to translate at all times during your development cycle. That has never been my experience. On projects I have worked on, we change the original text (English), then aggregate the changes at the end of the cycle. – dcstraw Feb 23 '09 at 21:47

Timo Huovinen · Answer 8 · 2016-01-30T11:19:12.703

At the end of the day, a translator should be able to sit down and change the texts for every language (so they match in meaning) without having to involve the programmer that already did his/her job.

This makes me feel like the proper answer is to use a modified version of gettext where you put strings like this

_(id, backup_text, context)

_('ABOUT_ME', 'About Me', 'HOMEPAGE')

context being optional

why like this? because you need to identify text in the system using unique ID's not english text that could get repeated elsewhere.

You should also keep the backup, id and context in the same place in your code to reduce discrepancies.

The id's also have to be readable, which brings in the problem of synonyms and duplicate use (even as ids), we could prefix the ids like this "HOMEPAGE_ABOUT_ME" or "MAIL_LETTER", but

people forget to do this at the start and changing it later is a problem
its more flexible for the system to be able to group both by id and context

which is why I also added the context variable at the end

the backup text can be pretty much anything, could even be "[ABOUT_ME@HOMEPAGE text failed to load, please contact example@example.com]"

It won't work with the current gettext editing programs like "poedit", but I think you can define custom variable names for translations like just "t()" without the underscore at the start.

I know that gettext also has support for contexts, but its not very well documented or widely used.

P.S. I'm not sure about the best variable order to enforce good and extendable code so suggestions are welcome.

I agree with this. This would be the perfect setup! Shame in 2020 we still don't have anything near like this... — tvb, Oct 07 '20 at 11:50

score 2 · Answer 9 · answered Oct 19 '08 at 16:20

2

I'd go so far as to say that you never (for most values of never) want to use free text as keys to anything. Imagine if SO used the query title as key to this page for instance. If someone links to it, and then the title is edited, the link is no longer valid.

Your problem is similar, except you would also be responsible for updating all links...

Like Douglas Leeder mentions, what you probably want to do is use English as the default (backup) language, although an interface that uses English and another language intermixed is highly confusing (but mildly amusing, too).

answered Oct 19 '08 at 16:20

Berserk

423
2
3

3

Links are different than messages. If the original text of a message changes, you don't necessarily want the same translated text to appear because the meaning of the message might be different. It's better to examine the message at that point to see if retranslation is necessary. – dcstraw Feb 23 '09 at 21:52
Unfortunately gettext makes setting a default language **really** hard. – Douglas Leeder Oct 08 '09 at 12:28
I agree with you, but having the freetext in the url is really useful, which is why SO has it. So why not apply the same solution as SO did for URLs to gettext? why not give gettext both the id and the freetext at the same time? the ID to fetch the translation and the freetext for the backup for when there is no translation? – Timo Huovinen Jan 30 '16 at 11:17

score 0 · Answer 10 · answered Dec 20 '12 at 18:55

In addition to the considerations above, there are many cases where you'd want the "key" (msgid) to be different from the source text (English). For example, in the HTML view, I might want to say [yyyy] where the destination and label of that anchor tag depend on the locale of the user. E.g. it might be a link to a social network, and in US it would be Facebook but in China it would be Weibo. So the MsgIds might be something like socialSiteUrl and socialSiteLabel.

I use a mix.

For basic strings that I don't think will have conflicts/changes/weird meanings, I'll make the key be the same as the English.

score 0 · Answer 11 · answered Mar 27 '20 at 00:51

0

We use Dutch. The strings should be written in the native language of the writer; this makes communication with translators less prone to errors, since the writer(s) can communicatie in their native language with them.

answered Mar 27 '20 at 00:51

Cochise Ruhulessin

1,001
1
11
15

Gettext: Is it a good idea for the message ID to be the english text?

11 Answers11

Using plain English

Using keys

Linked