Automatic gettext translation generator for testing (pseudolocalization)

Question

I'm currently in process of making site i18n-aware. Marking hardcoded strings as translatable.

I wonder if there's any automated tool that would let me browse the site and quickly see which strings are marked and which still aren't. I saw a few projects like django-i18n-helper that try to highlight translated strings using HTML facilities, but this doesn't work well with JavaScript.

So I thought FДЦЖ CУЯILLIC, or ʇxǝʇ uʍop-ǝpısdn (or something along those lines) should do the trick. Easy to distinguish visually, still readable, yet doesn't depend on any rich text formatting besides Unicode support.

The problem is, I can't find any readily-available tool that'd eat gettext .po/.pot file(s) and spew out such translation. Still, I think the idea is pretty obvious, so there must be something out there, already.

In my case I'm using Python/Django, but I suppose this question applies to anything that uses gettext-compatible library. The only thing the tool should be aware of, is that there could be HTML fragments in translation strings.

It seems that there is no good tool for this - none readily publicly available. I used msgfilter with sed for few test runs, but currently I ended up using Crowdin's pseudolocalization feature. Both approaches resulted in partially broken translation files (some placeholders were broken), that but I fixed those by hand. — drdaeman, Aug 06 '15 at 12:23

Wander Nauta · Accepted Answer · 2015-07-23T10:23:40.870

1

The msgfilter program will let you run your translations through any program you want. It works especially well with GNU sed.

For example, to turn all your translations into uppercase (HTML is mostly case-insensitive, so this should work):

msgfilter -i django.po sed -e 's/\(.*\)/\U\1/'

The only strings in your app that have lowercase letters in them would then be the hardcoded ones.

If you really want to do faux cyrillic, you just have to write a program or script that reads Latin and outputs that, and feed that program to msgfilter instead of sed.

If your distribution has a talkfilters package, it might provide a few programs that might be useful in this specific case. All of these should work as msgfilter filters. (My personal favorite is chef. Bork bork bork!)

edited Jul 23 '15 at 10:23

answered Jul 23 '15 at 10:05

Wander Nauta

18,832
1
45
62

Thanks. I think this solves at least half of the issue. Sadly, filters/talkfilters don't provide Unicode string mangling (like blackletter/zalgo/upside-down etc), and `chef`/`pirate`/etc aren't very visually distinguishable, so not exactly what I was looking for. And, moreover, they all don't seem to care about HTML markup and would happily make ` – drdaeman Jul 23 '15 at 11:03
Nice idea. Almost, yes. Except for places where someone decided the text has to SHOUT AT USER, but luckily those aren't very frequent. – drdaeman Jul 23 '15 at 12:20
1

Quick addition for the googlers: Seems like if you that your .po file (e.g django.po) is changed, then you would ned to write: `msgfilter -i django.po -o django.po sed -e 's/\(.*\)/\U\1/'` – Largo Jul 11 '16 at 15:32

score 0 · Answer 2 · answered Aug 06 '15 at 13:51

0

Haven't tried this myself yet, but found podebug tool from Translate Toolkit. Based on documentation (flipped and unicode rewrite options), this looks exactly the tool I wished for.

answered Aug 06 '15 at 13:51

drdaeman

11,159
7
59
104

Automatic gettext translation generator for testing (pseudolocalization)

2 Answers2