1

I have hash keys that look like this:

1я310яHOM_REF_truth:HOM_ALT_test:discordant_hom_ref_to_hom_altяAяC

this is a string that is joined by the Cyrillic letter я which I chose as a delimiter because it will never appear in this files.

I write this to a JSON file in Perl 5.30.2 thus:

use JSON 'encode_json';

sub hash_to_json_file {
    my $hash     = shift;
    my $filename = shift;
    my $json = encode_json $hash;
    open my $out, '>', $filename;
    say $out $json
}

and in python 3.8:

use json
def hash_to_json_file(hashtable,filename):
    json1=json.dumps(hashtable)
    f = open(filename,"w+")
    print(json1,file=f)
    f.close()

when I try to load a JSON written by Python back into a Perl script, I see a cryptic error that I don't know how to solve:

Wide character in say at read_json.pl line 27.

Reading https://perldoc.perl.org/perlunifaq.html I've tried adding use utf8 to my script, but it doesn't work. I've also tried '>:encoding(UTF-8)' within my subroutine, but the same error results.

Upon inspection of the JSON files, I see keys like "1Ñ180ÑHET_ALT_truth:HET_REF_test:discordant_het_alt_to_het_refÑAÑC,G" where ÑAÑ substitutes я. In the JSON written by python, I see \u044f I think that this is the wide character, but I don't know how to change it back.

I've also tried changing my subroutine:

use Encode 'decode';
sub json_file_to_hash {
   my $file = shift;
   open my $in, '<:encoding(UTF-8)', $file;
   my $json = <$in>;
   my $ref = decode_json $json;
   $ref = decode('UTF-8', $json);
   return %{ $ref }
}

but this gives another error:

Wide character in hash dereference at read_json.pl line 17, <$_[...]> line 1

How can I get python JSON read into Perl correctly?

con
  • 5,767
  • 8
  • 33
  • 62

2 Answers2

2
use utf8;                               # Source is encoded using UTF-8
use open ':std', ':encoding(UTF-8)';    # For say to STDOUT.  Also default for open()

use JSON qw( decode_json encode_json );

sub hash_to_json_file {
    my $qfn = shift;
    my $ref = shift;
    my $json = encode_json($ref);       # Produces UTF-8
    open(my $fh, '>:raw', $qfn)         # Write it unmangled
       or die("Can't create \"$qfn\": $!\n");

    say $fh $json;
}

sub json_file_to_hash {
    my $qfn = shift;
    open(my $fh, '<:raw', $qfn)         # Read it unmangled
       or die("Can't create \"$qfn\": $!\n");

    local $/;                           # Read whole file
    my $json = <$fh>;                   # This is UTF-8
    my $ref = decode_json($json);       # This produces decoded text
    return $ref;                        # Return the ref rather than the keys and values.
}

my $src = { key => "1я310яHOM_REF_truth:HOM_ALT_test:discordant_hom_ref_to_hom_altяAяC" };
hash_to_json("a.json", $src);
my $dst = hash_to_json("a.json");
say $dst->{key};

You could also avoid using :raw by using from_json and to_json.

use utf8;                               # Source is encoded using UTF-8
use open ':std', ':encoding(UTF-8)';    # For say to STDOUT. Also default for open()

use JSON qw( from_json to_json );

sub hash_to_json_file {
    my $qfn  = shift;
    my $hash = shift;
    my $json = to_json($hash);          # Produces decoded text.
    open(my $fh, '>', $qfn)             # "use open" will add :encoding(UTF-8)
       or die("Can't create \"$qfn\": $!\n");

    say $fh $json;                      # Encoded by :encoding(UTF-8)
}

sub json_file_to_hash {
    my $qfn = shift;
    open(my $fh, '<', $qfn)             # "use open" will add :encoding(UTF-8)
       or die("Can't create \"$qfn\": $!\n");

    local $/;                           # Read whole file
    my $json = <$fh>;                   # Decoded text thanks to "use open".
    my $ref = from_json($json);         # $ref contains decoded text.
    return $ref;                        # Return the ref rather than the keys and values.
}

my $src = { key => "1я310яHOM_REF_truth:HOM_ALT_test:discordant_hom_ref_to_hom_altяAяC" };
hash_to_json("a.json", $src);
my $dst = hash_to_json("a.json");
say $dst->{key};
ikegami
  • 367,544
  • 15
  • 269
  • 518
0

I like the ascii option so that the JSON output is all 7-bit ASCII

my $json = JSON->new->ascii->encode($hash);

Both the Perl and Python JSON modules will be able to read it.

mob
  • 117,087
  • 18
  • 149
  • 283
  • 1) Your suggested replacing `decode_json` using `my $json = JSON->new->ascii->encode($hash);`. That's not going to work. 2) `->ascii` has no effect when decoding. 3) There's no way to encode `1я310яHOM_REF_truth:HOM_ALT_test:discordant_hom_ref_to_hom_altяAяC` using ASCII. Remember, the *reader* doesn't output JSON – ikegami Apr 15 '20 at 23:44
  • I'm suggesting replacing `encode_json` in the `hash_to_json_file` subroutine. – mob Apr 15 '20 at 23:53
  • `hash_to_json_file` works fine. Again, the error the OP is asking about is from the program *reading* the JSON. Your suggesting doesn't help at all. It just makes the JSON larger without addressing the issue. – ikegami Apr 15 '20 at 23:59