I'm being passed a string such as:
my $x = "Zakłady Kuźnicze";
If you examine it closer you see that those two weird letters are actually composed of two bytes:
foreach (split(//, $x)) { print $_.' '.ord($_)."\n"; }
Z 90
a 97
k 107
� 197
� 130
a 97
d 100
y 121
32
K 75
u 117
� 197
� 186
n 110
i 105
c 99
z 122
e 101
I want to convert this to encoded HTML using the codes described here: https://www.w3schools.com/charsets/ref_utf_latin_extended_a.asp
So I need a function such that:
print encode_it($x)."\n";
yields:
Zakłady Kužnice
I've tried HTML::Entities::encode
and HTML::Entities::encode_numeric
, but these yield:
Zakłady Kuźnicze
Zakłady Kuźnicze
Which does not help, it renders as:
Zakłady Kuźnicze
Can anyone advise how to achieve this?
EDIT:
Like ikegami showed it works if use utf8
is used AND the string is set in the program:
perl -e 'use utf8; chomp; printf "%X\n", ord for split //, "Zakłady Kuźnicze"'
5A
61
6B
142
61
64
79
20
4B
75
17A
6E
69
63
7A
65
...but my input is actually coming in via STDIN, and it's not working from STDIN:
echo "Zakłady Kuźnicze" | perl -ne 'use utf8; chomp; printf "%X\n", ord for split //'
5A
61
6B
C5
82
61
64
79
20
4B
75
C5
BA
6E
69
63
7A
65
What subtlety am I missing here?