4

With Python - I can take a string and return it with multibyte characters UTF-8 escaped:

$ python3 -c 'print("hello ☺ world".encode("utf-8"))'
b'hello \xe2\x98\xba world'

Or unicode escaped:

$ python3 -c 'print("hello ☺ world".encode("unicode-escape"))'
b'hello \\u263a world'

Can Perl do something like this? I tried "quotemeta" but it seems it is not the right tool:

$ perl -e 'print quotemeta("hello ☺ world\n");'
hello\ \�\�\�\ world\
  • 1
    FYI, you need `-Mutf8` or `use utf8;` for your source code to be interpreted from UTF-8 (since that's how Perl ends up receiving it from the command line or a file). This is independent of how you end up outputting it. – Grinnz Nov 08 '18 at 17:57
  • Possible duplicate of [How can I output UTF-8 from Perl?](https://stackoverflow.com/questions/627661/how-can-i-output-utf-8-from-perl) – user692942 Nov 10 '18 at 11:46
  • @Lankymart not a duplicate. The linked question is about outputting Unicode characters properly. This question is about escaping Unicode characters. The solutions (IO layers vs Data::Dumper) are completely different. – amon Nov 10 '18 at 12:10
  • @amon I may not be a Perl programmer, but using `print` to output multi byte characters seems the same however you spin it. – user692942 Nov 10 '18 at 12:12
  • 5
    @Lankymart I'm a Perl tag gold badge holder and could duplicate-close the question with a single vote if it were a duplicate. But understanding of Perl is not required here. That question asks how to output properly encoded Unicode characters, e.g. ``. This question additionally wants to output escapes, e.g. `\x{1f600}`. They both use `print` merely because they both want to *output* something, but they output something different. – amon Nov 10 '18 at 12:23
  • @amon Fair enough, my mistake. – user692942 Nov 10 '18 at 14:51

1 Answers1

15

Data::Dumper, for one, can do this.

use utf8;
use Encode;
use Data::Dumper;
$Data::Dumper::Terse = 1;   # suppress  '$VAR1 = ...' header
$Data::Dumper::Useqq = 1;   # make output printable

print Dumper("hello ☺ world");
print Dumper(encode("UTF-8","hello ☺ world"));

Output:

"hello \x{263a} world"
"hello \342\230\272 world"

Update: the relevant function in the Data::Dumper module is qquote, so you can skip setting $Useqq and $Terse:

use utf8;
use Encode;
use Data::Dumper;

print Data::Dumper::qquote("hello ☺ world"), "\n";
print Data::Dumper::qquote(encode("UTF-8","hello ☺ world")), "\n";
mob
  • 117,087
  • 18
  • 149
  • 283