I have this string (Decimal NCRs): 日本の鍼灸とは
It represents the Japanese text 日本の鍼灸とは
.
But I need (UTF-8): %E6%97%A5%E6%9C%AC%E3%81%AE%E9%8D%BC%E7%81%B8%E3%81%A8%E3%81%AF
For the first character: 日
⇒ 日
⇒ %E6%97%A5
This site does it, but how do I get this in Perl? (If possible in a single regex like s/\&\#([0-9]+);/uc('%'.unpack("H2", pack("c", $1)))/eg;
.)
http://www.endmemo.com/unicode/unicodeconverter.php
Also I need to convert it back again from UTF-8 to Decimal NCRs
I've been breaking my head over this one for half a day now, any help is greatly appreciated!
#!/usr/bin/perl use strict; use warnings; use HTML::Entities qw( encode_entities ); use URI::Escape qw( uri_escape_utf8 ); my $html = '日'; my $text = decode_entities($html); my $uri_component = uri_escape_utf8($text); print $uri_component."\n";
I get `panic: utf16_to_utf8: odd bytelen 53 at jp.pl line 12.` – Eesger Mar 19 '15 at 13:53