The Python language is formally defined in "The Python Language Reference" (1), which includes details about the grammar of Python source code. "The Python Standard Library" (2) reference does a similar thing for the Python standard library, including the Pickle module (3). However, at no point does this document actually explain the structure of the Pickle format.
Based on the output of pickletools
, and from reading some explanations of the format, it is evident that Pickle is:
a stack based virtual machine which keeps track of objects. The file is just a list of serialized opcodes, the first one being expected to be the protocol version and the last one a stop opcode. When the stop opcode is met, the current object on the stack is popped. (4).
As far as I can tell, neither this structure nor the actual list of opcodes is actually officially defined any official Python document. Certain PEPs (5) (6) (7) describe new opcodes added in newer versions of the specification, but the original list of opcodes is nowhere to be found.
Where is the Pickle format formally defined? Is it just the Python source code? If this is true, how would it be possible to create a new Python implementation if the standard library has no formal specification?