34

This may be a stupid question, but here goes.

I've seen several projects using some translation library (e.g. gettext) working with plain english placeholders. So for example:

_("Please enter your name");

instead of abstract placeholders (which has always been my instinctive preference)

_("error_please_enter_name");

I have seen various recommendations on SO to work with the former method, but I don't understand why. What I don't get is what do you do if you need to change the english wording? Because if the actual text is used as the key for all existing translations, you would have to edit all the translations, too, and change each key. Or don't you?

Isn't that awfully cumbersome? Why is this the industry standard?

It's definitely not proper normalization to do it this way. Are there massive advantages to this method that I'm not seeing?

Arnaud Meuret
  • 985
  • 8
  • 26
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • possible duplicate of [Gettext: Is it a good idea for the message ID to be the english text?](http://stackoverflow.com/questions/216478/gettext-is-it-a-good-idea-for-the-message-id-to-be-the-english-text) – PhoneixS Mar 24 '14 at 08:12

8 Answers8

32

Yes, you have to alter the existing translation files, and that is a good thing.

If you change the English wording, the translations probably need to change, too. Even if they don't, you need someone who speaks the other language to check.

You prep a new version, and part of the QA process is checking the translations. If the English wording changed and nobody checked the translation, it'll stick out like a sore thumb and it'll get fixed.

Nicholas Knight
  • 15,774
  • 5
  • 45
  • 57
  • +1 Excellent point regarding the need for the translations to break when altered. – Orbling Nov 20 '10 at 13:40
  • 5
    Of course, there *are* cases where the translation does not need to change, namely if the change to the English wording does not change its meaning (trivial rewording, spelling fix, whitespace change). In that case, the need to check all translations is indeed a problem. – sleske Nov 28 '11 at 09:19
  • 1
    @sleske Probably this come a bit late, but the rewording/spelling problem can be solved by creating a translation file for the base language with just that string changed, that way you don't need to change the message id. – lordscales91 Mar 13 '16 at 09:31
  • 1
    What about using the same id on a lot of places? Imagine an error message which is used twenty or even more times? Using plain English `msgid` would force you to change it in every usage in your code and not just once in a `.PO` file. – Ben Lime Jun 12 '18 at 19:03
20
  1. The main language is already existent: you don't need to translate it.
  2. Translators have better context with a real sentence than vague placeholders.
  3. The placeholders are just the keys, it's still possible to change the original language by creating a translation for it. Because when the translation doesn't exists, it uses the placeholder as the translated text.
Savageman
  • 9,257
  • 6
  • 40
  • 50
  • 3
    Note that 2. is not really an issue - if you use placeholders, translators just need to use a tool that shows the original text alongside the placeholders. Still, gettext makes this a bit easier by not requiring such a tool. – sleske Nov 28 '11 at 09:26
8

We've been using abstract placeholders for a while and it was pretty annoying having to write everything twice when creating a new function. When English is the placeholder, you just write the code in English, you have meaningful output from the start and don't have to think about naming placeholders.

So my reason would be less work for the developers.

che
  • 12,097
  • 7
  • 42
  • 71
  • Yes, that's the reason according to the gettext docs: "GNU gettext is designed to minimize the impact of internationalization on program sources, keeping this impact as small and hardly noticeable as possible" (from the gettext manual, http://www.gnu.org/software/gettext/manual/gettext.html#Why ). – sleske Nov 28 '11 at 09:22
5

I like your second approach. When translating texts you always have the problem of homonyms. Like 'open' can mean a state of a window but also the verb to perform the action. In other languages these homonyms may not exist. That's why you should be able to add meaning to your placeholders. Best approach is to put this meaning in your text library. If this is not possible on the platform the framework you use, it might be a good idea to define a 'development language'. This language will add meaning to the text entries like: 'action_open' and 'state_open'. you will off course have to put extra effort i translating this language to plain english (or the language you develop for). I have put this philosophy in some large projects and in the long run this saves some time (and headaches).

The best way in my opinion is keeping meaning separate so if you develop your own translation library or the one you use supports it you can do something like this:

_(i18n("Please enter your name", "error_please_enter_name"));

Where:

i18n(text, meaning)
Jan
  • 8,011
  • 3
  • 38
  • 60
  • 1
    Now that I think about it, that is the main reason why I tend to prefer that method: You can add unlimited context into the word, making it easier to get it right when translating. Well put, +1 – Pekka Nov 20 '10 at 13:46
  • 8
    Actually, `gettext` has mechanisms to handle this, called "contexts". See the gettext manual, "11.2.5 Using contexts for solving ambiguities". http://www.gnu.org/software/gettext/manual/gettext.html#Contexts – sleske Nov 28 '11 at 09:24
4

Interesting question. I assume the main reason is that you don't have to care about translation or localization files during development as the main language is in the code itself.

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
  • 1
    yup, that might be the main reason - somewhat understandable for lone programmers and small teams, but it becomes a huge pain when you have, say, twenty translation files maintained by different people. That's why I don't understand why this seems to be some kind of industry standard – Pekka Nov 20 '10 at 13:33
3

Quite old question but one additional reason I haven't seen in the answers yet:

You could end up with more placeholders than necessary, thus more work for translators and possible inconsistent translations. However, good editors like Poedit or Gtranslator can probably help with that.

To stick with your example: The text "Please enter your name" could appear in a different context in a different template (that the developer is most likely not aware of and shouldn't need to be). E.g. it could be used not as an error but as a prompt like a placeholder of an input field.

If you use

_("Please enter your name");

it would be reusable, the developer can be unaware of the already existing key for an error message and would just use the same text intuitively.

However, if you used

_("error_please_enter_name");

in a previous template, developers wouldn't necessarily be aware of it and would make up a second key (most likely according to a predefined wording scheme to not end up in complete chaos), e.g.

_("prompt_please_enter_name");

which then has to be translated again.

So I think that doesn't scale very well. A pre-agreed wording scheme of suffixes/prefixes e.g. for contexts can never be as precise as the text itself I think (either too verbose or too general, beforehand you don't know and afterwards it's difficult to change) and is more work for the developer that's not worth it IMHO.

Does anybody agree/disagree?

benebun
  • 185
  • 1
  • 10
3

Well it probably is just that it's easier to read, and so easier to translate. I'm of the opinion that your way is best for scalability, but it does just require that extra bit of effort, which some developers might not consider worth it... and for some projects, it probably isn't.

Nathan MacInnes
  • 11,033
  • 4
  • 35
  • 50
3

There's a fallback hierarchy, from most specific locale to the unlocalised version in the source code.

So French in France might have the following fallback route:

  1. fr_FR
  2. fr
  3. Unlocalised. Source code.

As a result, having proper English sentences in the source code ensures that if a particular translation is not provided for in step (1) or (2), you will at least get a proper understandable sentence than random programmer garbage like “error_file_not_found”.

Plus, what do you do if it is a format string: “Sorry but the %s does not exist” ? Worse still: “Written %s entries to %s, total size: %d” ?

user268396
  • 11,576
  • 2
  • 31
  • 26
  • I personally prefer to see the garbage in that case so I don't overlook the problem, or I even tell my translation engine to throw an error. Still, this is valid info and surely one strong reason to work this way – Pekka Nov 20 '10 at 13:51
  • Format string errors - especially when using %s - don't lead to garbage but random crashes which are pretty annoying. – ThiefMaster Nov 20 '10 at 15:11