How to display readable UTF-8 strings with Data::Dumper?

Question

I have some UTF-8 encoded strings in structures which I am dumping for debugging purposes with Data::Dumper.

A small test case is:

use utf8;
use Data::Dumper;
say Dumper({да=>"не"}

It outputs

{
  "\x{434}\x{430}" => "\x{43d}\x{435}"
};

but I want to see

{
  "да" => "не"
};

Of course my structure is quite more complex. How can I make the strings in the dumped structure readable while debugging? Maybe I have to process the output via chr somehow before warn/say?

I'd also encourage you to post an answer. This is good to have on Stackoverflow while I can't find it. (The answer needn't be elaborate but just rounded enough with a simple example.) Another good source is [Accents not respected in printing out with data::dumper](https://stackoverflow.com/q/22781754/4653379) but it doesn't mention the `Encode` way so I wouldn't mark this as a duplicate — zdim, May 23 '18 at 16:46

score 7 · Accepted Answer · edited Jan 17 '19 at 19:47

7

Just for debugging:

#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
use utf8;
use Data::Dumper;
binmode STDOUT, ':utf8';

CASE_1: {
    # Redefine Data::Dumper::qquote() to do nothing
    no warnings 'redefine';
    local *Data::Dumper::qquote = sub { qq["${\(shift)}"] };
    # Use the Pure Perl implementation of Dumper
    local $Data::Dumper::Useperl = 1;

    say Dumper({да=>"не"});
}

CASE_2: {
    # Use YAML instead
    use YAML;
    say Dump({да=>"не"});
}

CASE_3: {
    # Evalulate whole dumped string
    no strict 'vars';
    local $Data::Dumper::Terse = 1;

    my $var = Dumper({да=>"не"});
    say eval "qq#$var#" or die $@;
}

__END__
$VAR1 = {
          "да" => "не"
        };

---
да: не

{
  "да" => "не"
}

edited Jan 17 '19 at 19:47

David Tonhofer

14,559
5
55
51

answered May 24 '18 at 13:53

ernix

3,442
1
17
23

Just by switching to the pure Perl implementation of `Data::Dumper`, the output (written to STDOUT which I have told Perl should be UTF-8 using "`binmode STDOUT, ':encoding(UTF-8)';`" switches from `"Sau\x{f0}anes Airport"` (escaped ISO-8859-1; cannot be `eval`-ed back assuming UTF-8) to `'Sauðanes Airport'` (unescaped UTF-8). Is that some kind of bug? I am not amused. – David Tonhofer Jan 17 '19 at 19:52
@DavidTonhofer Wherever you prefer, gist maybe? – ernix Jan 18 '19 at 13:50
@ernix A gist exists at https://gist.github.com/dtonhofer/29c8d561c911cc93052f2bb2181ee75e -- although the output differs between implementations, I was unable to reproduce to problems of reading-back I encountered earlier. Reading yields the correct strings. Tested on Fedora 29 and CentOS 7.6. I have to review my processing pipeline I set up at work to look for some error. – David Tonhofer Jan 19 '19 at 14:53
@DavidTonhofer Try `Useqq(1)`, I also forked/replied to your gist. – ernix Jan 19 '19 at 18:07

score 1 · Answer 2 · answered Jan 24 '20 at 09:13

1

print Dumper(%mydata) =~ s/\\x\{([0-9a-f]{2,})\}/chr hex $1/ger;

answered Jan 24 '20 at 09:13

leszekrabka

49
3

1

While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. I would recommend you to check SO's [official How to Answer article](https://stackoverflow.com/help/how-to-answer) along with the comprehensive [blog post](https://codeblog.jonskeet.uk/2009/02/17/answering-technical-questions-helpfully/) from [Jon Skeet](https://stackoverflow.com/users/22656/jon-skeet). – Aleksey Potapov Jan 24 '20 at 09:32

Mihail N · Answer 3 · 2018-10-11T11:19:52.047

-1

sorry but I had tested eval whole dump and had got some repugnancy for my data so

Data::Dumper->new(\@_)
  ->Indent(1)->Sortkeys(1)->Terse(1)->Useqq(0)->Dump
  =~ s/((?:\\x\{[\da-f]+\})+)/eval '"'.$1.'"'/eigr;

edited Oct 11 '18 at 11:19

answered Oct 11 '18 at 05:39

Mihail N

1
1

Code pasted need more info for others. Please edit your post to add more info as to why your code is better than the chosen one. – Syfer Oct 11 '18 at 10:55
again sorry I have data with symbols $ and @ inside – Mihail N Oct 15 '18 at 07:48

How to display readable UTF-8 strings with Data::Dumper?

3 Answers3