Comparison of unicode string fails in Test::Deep::cmp_deeply when I use JSON::decode_json

Question

In Perl v5.10.1, I need to compare native perl structures with unicode characters to similar structures created by JSON::decode_json.

Example:

use strict; use warnings;
#use utf8;
use JSON;
use Test::Deep qw(cmp_deeply);
cmp_deeply(["1"], JSON::decode_json('["1"]'), 'utf8 test 11'); # will pass
cmp_deeply(["≥"], JSON::decode_json('["1"]'), 'utf8 test ≥1'); # will fail
cmp_deeply(["1"], JSON::decode_json('["≥"]'), 'utf8 test 1≥'); # will fail
cmp_deeply(["≥"], JSON::decode_json('["≥"]'), 'utf8 test ≥≥'); # should pass

I am not able to explain what is going on with the last case, why the 2 structures are not equal? I tried to R T F M which didn't really improve my understanding of the issue.

Here is the output (slightly edited since TAP is too verbose):

ok 1 - utf8 test 11
not ok 2 - utf8 test ≥1
# Compared $data->[0]
#    got : '≥'
# expect : '1'
not ok 3 - utf8 test 1≥
Wide character in print at Test/Builder.pm line 1698.
# Compared $data->[0]
#    got : '1'
# expect : '≥'
not ok 4 - utf8 test ≥≥
Wide character in print at Test/Builder.pm line 1698.
# Compared $data->[0]
#    got : 'â¥'
# expect : '≥'

When I tried it with use utf8;, it was even worse (the script died after 2nd test):

ok 1 - utf8 test 11
not ok 2 - utf8 test ≥1
Wide character in print at Test/Builder.pm line 1698.
Wide character in print at Test/Builder.pm line 1698.
Wide character in print at Test/Builder.pm line 1698.
# Compared $data->[0]
#    got : '≥'
# expect : '1'
Wide character in subroutine entry at ...
# Tests were run but no plan was declared and done_testing() was not seen.

I also tried a workaround which works for the comparison..

use utf8;
cmp_deeply(["≥"], JSON->new->utf8(0)->decode('["≥"]'), 'utf8 test ≥≥');

...but I still get the stupid warning:

ok 1 - utf8 test ≥≥
Wide character in print at Test/Builder.pm line 1698.

Is there a way to just make it work - something like use magical_unicode_solution;?

Or perhaps I should do my tests in a different way that would make it compatible with Unicode?

score 3 · Answer 1 · answered Jul 30 '14 at 19:03

3

This test will pass:

use Encode;
cmp_deeply( [ Encode::decode("utf8","≥") ],
    JSON::decode_json('["≥"]'), 'utf8 test ≥≥');

The JSON decoder treats the input as UTF-8 encoded and returns a decoded string. Your original test (test 4) compared a UTF-8 encoded string (two octets) with a decoded string (a single wide character).

answered Jul 30 '14 at 19:03

mob

117,087
18
149
283

thanks, this is what I figured out later as well, but I was hoping to avoid decoding the whole nested structure manually.. anyway, do you have any insight about why 2nd test printed out different stuff than test 4 for the same arrayref? – Aprillion Jul 30 '14 at 21:44
also, why are there only 2 warnings about `Wide character in print`, when the wide character is present in 3 tests? – Aprillion Jul 31 '14 at 09:08

Comparison of unicode string fails in Test::Deep::cmp_deeply when I use JSON::decode_json

1 Answers1