0

Can you point me tool to convert japanese characters to unicode?

Makoto
  • 104,088
  • 27
  • 192
  • 230
TopCoder
  • 4,206
  • 19
  • 52
  • 64

4 Answers4

2

CPAN gives me "Unicode::Japanese". Hope this is helpful to start with. Also you can look at article on Character Encodings in Perl and perl doc for unicode for more information.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Space
  • 7,049
  • 6
  • 49
  • 68
1

See http://p3rl.org/UNI.

use Encode qw(decode encode);
my $bytes_in_sjis_encoding = "\x88\xea\x93\xf1\x8e\x4f";
my $unicode_string = decode('Shift_JIS', $bytes_in_sjis_encoding); # returns 一二三
my $bytes_in_utf8_encoding = encode('UTF-8', $unicode_string); # returns "\xe4\xb8\x80\xe4\xba\x8c\xe4\xb8\x89"

For batch conversion from the command line, use piconv:

piconv -f Shift_JIS -t UTF-8 < infile > outfile
daxim
  • 39,270
  • 4
  • 65
  • 132
0

First, you need to find out the encoding of the source text if you don't know it already.

The most common encodings for Japanese are:

  1. euc-jp: (often used on Unixes and some web pages etc with greater Kanji coverage than shift-jis)
  2. shift-jis (Microsoft also added some extensions to shift-jis which is called cp932, which is often used on non-Unicode Windows programs)
  3. iso-2022-jp is a distant third

A common encoding conversion library for many languages is iconv (see http://en.wikipedia.org/wiki/Iconv and http://search.cpan.org/~mpiotr/Text-Iconv-1.7/Iconv.pm) which supports many other encodings as well as Japanese.

cryo
  • 14,219
  • 4
  • 32
  • 35
0

This question seems a bit vague to me, I'm not sure what you're asking. Usually you would use something like this:

open my $file, "<:encoding(cp-932)", "JapaneseFile.txt"

to open a file with Japanese characters. Then Perl will automatically convert it into its internal Unicode format.