4

I have some binary data which is in Python in the form of an array of byte strings.

Is there a portable way to serialize this data that other languages could read?

JSON loses because I just found out that it has no real way to store binary data; its strings are expected to be Unicode.

I don't want to use pickle because I don't want the security risk, and that limits its use to other Python programs.

Any advice? I would really like to use a builtin library (or at least one that's part of the standard Anaconda distribution).

Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
Jason S
  • 184,598
  • 164
  • 608
  • 970

1 Answers1

4

If you just need the binary data in the strings and can recover the boundaries between the individual strings easily, you could just write them to a file directly, as raw strings.

If you can't recover the string boundaries easily, JSON seems like a good option:

a = [b"abc\xf3\x9c\xc6", b"xyz"]
serialised = json.dumps([s.decode("latin1") for s in a])
print [s.encode("latin1") for s in json.loads(serialised)]

will print

['abc\xf3\x9c\xc6', 'xyz']

The trick here is that arbitrary binary strings are valid latin1, so they can always be decoded to Unicode and encoded back to the original string again.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • The boundaries aren't impossible to recreate, but they're not easy to get either. (the [packet-framing problem](http://www.embeddedrelated.com/showarticle/113.php) ) So yeah, I can live with JSON's overhead given its widespread nature. – Jason S Mar 24 '14 at 22:22