19

If a website is localized/internationalized with a Simplified Chinese translation...

  • Is it possible to reliably automatically convert the text to Traditional Chinese in a high quality way?
  • If so, is it going to be extremely high quality or just a good starting point for a translator to tweak?
  • Are there open source tools (ideally in PHP) to do such a conversion?
  • Is the conversion better one way vs. the other (simplified -> traditional, or vice versa)?
Makoto
  • 104,088
  • 27
  • 192
  • 230
philfreo
  • 41,941
  • 26
  • 128
  • 141

6 Answers6

23

Short answer: No, not reliably+high quality. I wouldn't recommend automated tools unless the market isn't that important to you and you can risk certain publicly embarrassing flubs. You may find some localization firms are happier to start with a quality simplified Chinese translation and adapt it to traditional, but you may also find that many companies prefer to start with the English source.

Longer answer: There are some cases where only the glyphs are different, and they have different unicode code points. But there are also some idiomatic and vocabulary differences between the PRC and Taiwan/Hong Kong, and your quality will suffer if these aren't handled. Technical terms may be more problematic or less, depending on the era in which the terms became commonly used. Some of these issues may be caught by automated tools, but not all of them. Certainly, if you go the route of automatically converting things, make sure you get buyoff from QA teams based in each of your target markets.

Additionally, there are sociopolitical concerns as well. For example, you can use terms like "Republic of China" in Taiwan, but this will royally piss off the Chinese government if it appears in your simplified Chinese version (and sometimes your English version); if you have an actual subsidiary or partner in China, the staff may be arrested solely on the basis of subversive terminology. (This is not unique to China; Pakistan/India and Turkey have similar issues). You can get into similar trouble by referring to "Taiwan" as a "country."

JasonTrue
  • 19,244
  • 4
  • 34
  • 61
  • Thanks, makes a lot of sense. If one was to want to start with an automated translation, how could that be done, programmatically? – philfreo May 15 '11 at 04:11
  • Because my internationalization experience was at a company where it was impractical to use machine translation (Microsoft), I don't have a lot of information on fully automated translation, but you should be aware that most localization firms use a translation memory system (something like Trados) that allows them to translate frequently occurring terms consistently, and the more projects that they've done the more beneficial such a system is; it's not quite automated, but it does dramatically increase their delivery time and generally reduces costs over completely manual localization. – JasonTrue May 15 '11 at 05:20
11

As a native Hong Konger myself, I concur with @JasonTrue: don't do it. You risk angering and offending your potential users in Taiwan and Hong Kong.

BUT, if you still insist on doing so, have a look at how Wikipedia does it; here is one implementation (note license).

spacehunt
  • 803
  • 5
  • 8
5

Is it possible to reliably automatically convert the text to Traditional Chinese in a high quality way?

Other answers are focused on the difficulties, but these are exaggerated. One thing is that a substantial portion of the characters are exactly the same. The second thing is the 'simplified' forms are exactly that: simplified forms of the traditional characters. That means mostly there is a 1 to 1 relationship between traditional and simplified characters.

If so, is it going to be extremely high quality or just a good starting point for a translator to tweak?

A few things will need tweaking.

Are there open source tools (ideally in PHP) to do such a conversion?

Not that I am aware of, though you might want to check out the google translate api?

Is the conversion better one way vs. the other (simplified -> traditional, or vice versa)?

A few characters lost distinction in the simplified alphabet. For instance 麵(flour) was simplified to the same character as 面(face, side). For this reason traditional->simplified would be slightly more accurate.

I'd also like to point out that traditional characters are not solely in use in Taiwan (They can be found in HK and occasionally even in the mainland)


I was able to find this and this. Need to create an account to download, though. Never used the site myself so I cannot vouch for it.

jisaacstone
  • 4,234
  • 2
  • 25
  • 39
  • 3
    If you're evaluating the results based on *what proportion of characters were translated correctly*, then yes many of the characters have one-to-one mappings, and yes traditional->simplified is slightly better. But we're talking about *language* here, not statistics. Character-based conversion is naive and wrong. I'm sure there are better automated systems out there -- the suggestion to check out the google translate api is a good start. – Todd Owen Jun 06 '11 at 02:42
  • ...although it seems Google has deprecated the translate api as of a week ago (http://code.google.com/apis/language/translate/overview.html). – Todd Owen Jun 06 '11 at 02:50
  • 2
    And ultimately, I concur with @JasonTrue that if the Chinese-speaking market presents any value to your business at all, then you need a professional translator to look at it. Or if you *really* want to trade off quality for price, run it through automatic conversion and then get an educated native speaker to proof read the results. – Todd Owen Jun 06 '11 at 03:00
4

Fundamentally, simplified Chinese words have a lot of missing meanings. No programming language in the world will be able to accurately convert simplified Chinese into traditional Chinese. You will just cause confusion for your intended audience (Hong Kong, Macau, Taiwan).

A perfect example of failed translation from simplified Chinese to traditional Chinese is the word "后". In the simplified form, it has two meanings, "behind" or "queen". When you attempt to convert this back to traditional Chinese, however, there can be more than two character choices: 後 "behind" or 后 "queen". One funny example I came across is a translator which converted "皇后大道" Queen's Road to "皇後大道", which literally means Queen's Behind Road.

Unless your translation algorithm is super smart, it is bound to produce errors. So you're better off hiring a very good translator who's fluent in both types of Chinese.

Lok Yan Wong
  • 165
  • 1
  • 10
0

Short answer: Yes. And it's easy. You can firstly convert it from UTF-8 to BIG5, then there are lots of tools for you to convert BIG5 to GBK, then you can convert GBK to UTF-8.

Zhang Buzz
  • 10,420
  • 6
  • 38
  • 47
0

I know nothing about any form of Chinese, but by looking at the examples in this Wikipedia page I'm inclined to think that automatic conversion is possible, since many of the phrases seem to use the same number of characters and even the some of the same characters.

I ran a quick test using a multibyte ord() function and I can't see any patterns that would allow the automatic conversion without the use of a (huge?) lookup translation table.

Traditional Chinese 漢字
Simplified Chinese  汉字

function mb_ord($string)
{
    if (is_array($result = unpack('N', iconv('UTF-8', 'UCS-4BE', $string))) === true)
    {
        return $result[1];
    }

    return false;
}

var_dump(mb_ord('漢'), mb_ord('字')); // 28450, 23383
var_dump(mb_ord('汉'), mb_ord('字')); // 27721, 23383

This might be a good place to start building the LUTT:

I got to this other linked answer that seems to agree (to some degree) with my reasoning:

There are several countries where Chinese is the main written language. The major difference between them is whether they use simplified or traditional characters, but there are also minor regional differences (in vocabulary, etc).

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Alix Axel
  • 151,645
  • 95
  • 393
  • 500