1

I'm getting different outputs pickling a string when using Python 2 and Python 3 (due to different str types I suppose).

Python 2:

Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609]
>>> import pickle, base64
>>> a = pickle.dumps('test')
>>> base64.b64encode(a)
'Uyd0ZXN0JwpwMAou'

Python 3:

Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609]
>>> import pickle, base64
>>> a = pickle.dumps('test')
>>> base64.b64encode(a)
b'gANYBAAAAHRlc3RxAC4='

How can I modify the code to get the same results when pickling a string?

EDIT:

When using protocol=2 still getting different pickles:

# Python 2
>>> base64.b64encode(pickle.dumps('test', protocol=2))
'gAJVBHRlc3RxAC4='

# Python 3
>>> base64.b64encode(pickle.dumps('test', protocol=2))
b'gAJYBAAAAHRlc3RxAC4='
JoaoAlby
  • 167
  • 2
  • 10
  • 1
    Change [the protocol](https://docs.python.org/3/library/pickle.html#data-stream-format) in your `pickle.dumps()` call. [Possibly related question](https://stackoverflow.com/questions/23582489/python-pickle-protocol-choice) – sco1 Feb 12 '18 at 17:16
  • Try adding ``protocol=2, fix_imports=True`` to the dumps statement in version 3 – PaW Feb 12 '18 at 17:24
  • Do you want identical output, or data compatibility? Either pickled variant works correctly with `pickle.loads()`, in both Python 2 and Python 3. I suppose something like data padding may differ a little bit, leading to variations in serialization. I suspect the difference is in the unicode vs byte string. – 9000 Feb 12 '18 at 17:53

1 Answers1

2

Python can use different stream versions when pickling. Default versions differ between Python 2 and Python 3.

Pass the protocol version explicitly. Use pickle.dumps('test', protocol=2) to get consistent results across versions.

Note: The exact output may change, but the unpickling result remains the same, modulo "unicode" vs "ascii" in Python 2:

# Python 2.7 output:
>>> base64.b64encode(pickle.dumps('test', protocol=2))
'gAJVBHRlc3RxAC4='
# Decode output from Python 3:
>>> pickle.loads(base64.b64decode('gAJYBAAAAHRlc3RxAC4='))
u'test'

# Python 3.6 output:
>>> base64.b64encode(pickle.dumps('test', protocol=2))
b'gAJYBAAAAHRlc3RxAC4='
# Decoding Python 2's output:
>>> pickle.loads(base64.b64decode('gAJVBHRlc3RxAC4='))
'test'  # Note, not u'test'.
9000
  • 39,899
  • 9
  • 66
  • 104