7

I have some UTF-8 encoded strings in structures which I am dumping for debugging purposes with Data::Dumper.

A small test case is:

use utf8;
use Data::Dumper;
say Dumper({да=>"не"}

It outputs

{
  "\x{434}\x{430}" => "\x{43d}\x{435}"
};

but I want to see

{
  "да" => "не"
};

Of course my structure is quite more complex. How can I make the strings in the dumped structure readable while debugging? Maybe I have to process the output via chr somehow before warn/say?

David Tonhofer
  • 14,559
  • 5
  • 55
  • 51
Беров
  • 1,383
  • 10
  • 22
  • 2
    I'd also encourage you to post an answer. This is good to have on Stackoverflow while I can't find it. (The answer needn't be elaborate but just rounded enough with a simple example.) Another good source is [Accents not respected in printing out with data::dumper](https://stackoverflow.com/q/22781754/4653379) but it doesn't mention the `Encode` way so I wouldn't mark this as a duplicate – zdim May 23 '18 at 16:46

3 Answers3

7

Just for debugging:

#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
use utf8;
use Data::Dumper;
binmode STDOUT, ':utf8';

CASE_1: {
    # Redefine Data::Dumper::qquote() to do nothing
    no warnings 'redefine';
    local *Data::Dumper::qquote = sub { qq["${\(shift)}"] };
    # Use the Pure Perl implementation of Dumper
    local $Data::Dumper::Useperl = 1;

    say Dumper({да=>"не"});
}

CASE_2: {
    # Use YAML instead
    use YAML;
    say Dump({да=>"не"});
}

CASE_3: {
    # Evalulate whole dumped string
    no strict 'vars';
    local $Data::Dumper::Terse = 1;

    my $var = Dumper({да=>"не"});
    say eval "qq#$var#" or die $@;
}

__END__
$VAR1 = {
          "да" => "не"
        };

---
да: не

{
  "да" => "не"
}
David Tonhofer
  • 14,559
  • 5
  • 55
  • 51
ernix
  • 3,442
  • 1
  • 17
  • 23
  • Just by switching to the pure Perl implementation of `Data::Dumper`, the output (written to STDOUT which I have told Perl should be UTF-8 using "`binmode STDOUT, ':encoding(UTF-8)';`" switches from `"Sau\x{f0}anes Airport"` (escaped ISO-8859-1; cannot be `eval`-ed back assuming UTF-8) to `'Sauðanes Airport'` (unescaped UTF-8). Is that some kind of bug? I am not amused. – David Tonhofer Jan 17 '19 at 19:52
  • @DavidTonhofer Wherever you prefer, gist maybe? – ernix Jan 18 '19 at 13:50
  • @ernix A gist exists at https://gist.github.com/dtonhofer/29c8d561c911cc93052f2bb2181ee75e -- although the output differs between implementations, I was unable to reproduce to problems of reading-back I encountered earlier. Reading yields the correct strings. Tested on Fedora 29 and CentOS 7.6. I have to review my processing pipeline I set up at work to look for some error. – David Tonhofer Jan 19 '19 at 14:53
  • @DavidTonhofer Try `Useqq(1)`, I also forked/replied to your gist. – ernix Jan 19 '19 at 18:07
1

print Dumper(%mydata) =~ s/\\x\{([0-9a-f]{2,})\}/chr hex $1/ger;
  • 1
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. I would recommend you to check SO's [official How to Answer article](https://stackoverflow.com/help/how-to-answer) along with the comprehensive [blog post](https://codeblog.jonskeet.uk/2009/02/17/answering-technical-questions-helpfully/) from [Jon Skeet](https://stackoverflow.com/users/22656/jon-skeet). – Aleksey Potapov Jan 24 '20 at 09:32
-1

sorry but I had tested eval whole dump and had got some repugnancy for my data so

Data::Dumper->new(\@_)
  ->Indent(1)->Sortkeys(1)->Terse(1)->Useqq(0)->Dump
  =~ s/((?:\\x\{[\da-f]+\})+)/eval '"'.$1.'"'/eigr;
Mihail N
  • 1
  • 1