4

In Perl v5.10.1, I need to compare native perl structures with unicode characters to similar structures created by JSON::decode_json.

Example:

use strict; use warnings;
#use utf8;
use JSON;
use Test::Deep qw(cmp_deeply);
cmp_deeply(["1"], JSON::decode_json('["1"]'), 'utf8 test 11'); # will pass
cmp_deeply(["≥"], JSON::decode_json('["1"]'), 'utf8 test ≥1'); # will fail
cmp_deeply(["1"], JSON::decode_json('["≥"]'), 'utf8 test 1≥'); # will fail
cmp_deeply(["≥"], JSON::decode_json('["≥"]'), 'utf8 test ≥≥'); # should pass

I am not able to explain what is going on with the last case, why the 2 structures are not equal? I tried to RTFM which didn't really improve my understanding of the issue.

Here is the output (slightly edited since TAP is too verbose):

ok 1 - utf8 test 11
not ok 2 - utf8 test ≥1
# Compared $data->[0]
#    got : '≥'
# expect : '1'
not ok 3 - utf8 test 1≥
Wide character in print at Test/Builder.pm line 1698.
# Compared $data->[0]
#    got : '1'
# expect : '≥'
not ok 4 - utf8 test ≥≥
Wide character in print at Test/Builder.pm line 1698.
# Compared $data->[0]
#    got : 'â¥'
# expect : '≥'

When I tried it with use utf8;, it was even worse (the script died after 2nd test):

ok 1 - utf8 test 11
not ok 2 - utf8 test ≥1
Wide character in print at Test/Builder.pm line 1698.
Wide character in print at Test/Builder.pm line 1698.
Wide character in print at Test/Builder.pm line 1698.
# Compared $data->[0]
#    got : '≥'
# expect : '1'
Wide character in subroutine entry at ...
# Tests were run but no plan was declared and done_testing() was not seen.

I also tried a workaround which works for the comparison..

use utf8;
cmp_deeply(["≥"], JSON->new->utf8(0)->decode('["≥"]'), 'utf8 test ≥≥');

...but I still get the stupid warning:

ok 1 - utf8 test ≥≥
Wide character in print at Test/Builder.pm line 1698.

Is there a way to just make it work - something like use magical_unicode_solution;?

Or perhaps I should do my tests in a different way that would make it compatible with Unicode?

Community
  • 1
  • 1
Aprillion
  • 21,510
  • 5
  • 55
  • 89

1 Answers1

3

This test will pass:

use Encode;
cmp_deeply( [ Encode::decode("utf8","≥") ],
    JSON::decode_json('["≥"]'), 'utf8 test ≥≥');

The JSON decoder treats the input as UTF-8 encoded and returns a decoded string. Your original test (test 4) compared a UTF-8 encoded string (two octets) with a decoded string (a single wide character).

mob
  • 117,087
  • 18
  • 149
  • 283
  • thanks, this is what I figured out later as well, but I was hoping to avoid decoding the whole nested structure manually.. anyway, do you have any insight about why 2nd test printed out different stuff than test 4 for the same arrayref? – Aprillion Jul 30 '14 at 21:44
  • also, why are there only 2 warnings about `Wide character in print`, when the wide character is present in 3 tests? – Aprillion Jul 31 '14 at 09:08