1

How do I use the iconv in Ruby to convert a string from Simplified Chinese to Traditional Chinese (and vice-versa)?

I've tried

Iconv.conv("gb2312//IGNORE", "big5//IGNORE", '大家一起學中文')

I get an entirely different string. I've tried with the GBK and BIG5 encodings, I get an IllegalSequence Error.

Thanks.

alste
  • 1,365
  • 2
  • 13
  • 24
  • 1
    Which ruby version do you use? Ruby 1.9 include encoding handling. You may use `String#encode('Big5')` to convert to Big5. I can't test it - I get only little squares when I run a testprogramm :( (Seems chinese fonts are missing on my PC) – knut Nov 10 '11 at 20:15
  • I tried it out, it doesn't work unfortunately – alste Nov 10 '11 at 20:59

3 Answers3

4

https://rubygems.org/gems/tradsim

I just wrote a gem

To install the gem

gem install tradsim

To use the gem

# encoding: UTF-8
require 'tradsim'
puts Tradsim::to_sim("大家一起學中文")

it will yield

大家一起学中文

and you can use Tradsim::to_trad to do the reverse.

erinata
  • 41
  • 2
  • ur gem is inaccurate from my test – c2h2 Aug 29 '13 at 12:21
  • Can you give me an example so that I know whether it's because of programming error or the imperfection in the translation table? The translation table is imperfect. For example, for the character 后, I don't actually know if it's traditional chinese (as in 皇后) or its simplified chinese (as in 后來). I hope that someone can suggest a good way to deal with those characters which are the same in sim chinese but different in trad chinese. – erinata Nov 13 '13 at 05:51
0

OpenCC

https://github.com/BYVoid/OpenCC

As of 2021, this sees to be the most popular choice:

sudo apt install opencc
opencc -i input.txt -o output.txt -c t2s.json

With:

input.txt

大家一起學中文

we get:

output.txt

大家一起学中文

It also has APIs for several languages like Python and Node.js.

Tested on Ubuntu 21.04, opencc 1.1.1.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
0

Are you trying to convert, say, 學 to 学? I could be wrong, but I don't think Iconv will perform that type of conversion.

Marnen Laibow-Koser
  • 5,959
  • 1
  • 28
  • 33
  • yes, that's what I'm trying to do. Isn't it a simple Big5 to GB conversion? If iconv can't do that, do you know how I can do that with Ruby? – alste Nov 10 '11 at 20:01
  • 1
    any idea how? i tried `@text = "大家一起學中文" @c = Iconv.conv('gb2312', 'utf-8', @text) @c = Iconv.conv('big5', 'gb2312', @c) @c = Iconv.conv('utf-8', 'big5', @c)`, does not work (gives me a different string) – alste Nov 10 '11 at 21:01
  • ...but I think not. All Iconv should do, if I understand correctly, is spit out the exact same thing in a different encoding. Check out http://www.fileformat.info/info/unicode/char/5b78/index.htm and notice that 學 exists in both Big5 and GB. 学, however, only exists in GB. Hmm again. I think higher-level routines are needed to perform the transformation, since this isn't just an encoding difference, but I'm not certain. – Marnen Laibow-Koser Nov 10 '11 at 21:04
  • There's some good discussion of the issues at http://stackoverflow.com/questions/5998607/conversion-from-simplified-to-traditional-chinese ; from that page, I was able to find http://mediawiki-zhconverter.googlecode.com/svn-history/r3/trunk/mediawiki-zhconverter.inc.php . It's PHP, but it should give you some idea of an algorithm. – Marnen Laibow-Koser Nov 10 '11 at 21:18