5

The Python language is formally defined in "The Python Language Reference" (1), which includes details about the grammar of Python source code. "The Python Standard Library" (2) reference does a similar thing for the Python standard library, including the Pickle module (3). However, at no point does this document actually explain the structure of the Pickle format.

Based on the output of pickletools, and from reading some explanations of the format, it is evident that Pickle is:

a stack based virtual machine which keeps track of objects. The file is just a list of serialized opcodes, the first one being expected to be the protocol version and the last one a stop opcode. When the stop opcode is met, the current object on the stack is popped. (4).

As far as I can tell, neither this structure nor the actual list of opcodes is actually officially defined any official Python document. Certain PEPs (5) (6) (7) describe new opcodes added in newer versions of the specification, but the original list of opcodes is nowhere to be found.

Where is the Pickle format formally defined? Is it just the Python source code? If this is true, how would it be possible to create a new Python implementation if the standard library has no formal specification?

Migwell
  • 18,631
  • 21
  • 91
  • 160
  • Pickle isn't part of the Python implementation, it's a module written in Python. – Barmar May 16 '21 at 07:45
  • https://github.com/python/cpython/blob/main/Lib/pickle.py – Barmar May 16 '21 at 07:45
  • 3
    And as the docs page on pickle notes, under **Data stream format**, _"`pickletools` source code has extensive comments about opcodes used by pickle protocols"_. [That file](https://github.com/python/cpython/blob/main/Lib/pickletools.py) describes itself as "executable documentation". – jonrsharpe May 16 '21 at 07:48
  • Considering that last point I would be happy to accept this as an answer. The `pickletools` comments, particularly here: https://github.com/python/cpython/blob/1a08c5ac49b748d5e4e4b508d22d3804e3cd4dcc/Lib/pickletools.py#L38-L154 are indeed quite comprehensive. – Migwell May 16 '21 at 07:54

0 Answers0