Take a look at the source code of the underlining _pickle module (written in C), there is only one place where such error can appear:
static Py_ssize_t
marker(UnpicklerObject *self)
{
Py_ssize_t mark;
if (self->num_marks < 1) {
PickleState *st = _Pickle_GetGlobalState();
PyErr_SetString(st->UnpicklingError, "could not find MARK");
return -1;
}
...
}
Underhood, the pickle module uses a mark stack for unpickling container objects, and num_marks
indicates how many objects in it. By initialization of Unpickler function, num_marks
is set to 0 and then incremented when a new mark pushed onto the mark stack.
One of the possible ways to get this error is to change FRAME opcode (the value that indicates the start of a frame) or MARK opcode. Let's assume that we use protocol 4 (starting from this version pickle features binary framing), you can read more in pep 3154. The idea is simple and straightforward - split all the content into chunks (frames) and mark the boundaries of every frame. Let's dive into it.
Consider this example:
import pickle
data = {"fruits": ["apple", "banana", "pineapple"] }
with open("data.pickle", 'wb') as f:
# Pickle the 'data' dictionary using the highest protocol available.
pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
Let's investigate a bit with pickletools:
python -m pickletools data.pickle
0: \x80 PROTO 5
2: \x95 FRAME 46
11: } EMPTY_DICT
12: \x94 MEMOIZE (as 0)
13: \x8c SHORT_BINUNICODE 'fruits'
21: \x94 MEMOIZE (as 1)
22: ] EMPTY_LIST
23: \x94 MEMOIZE (as 2)
24: ( MARK
25: \x8c SHORT_BINUNICODE 'apple'
32: \x94 MEMOIZE (as 3)
33: \x8c SHORT_BINUNICODE 'banana'
41: \x94 MEMOIZE (as 4)
42: \x8c SHORT_BINUNICODE 'pineapple'
53: \x94 MEMOIZE (as 5)
54: e APPENDS (MARK at 24)
55: s SETITEM
56: . STOP
the highest protocol among opcodes = 4
Here FRAME (0x95) indicates the start position of a new frame and MARK ('(') the start of the container object, these values are specified by protocol (this is an implementation detail).
I would first take a look at the content of the pickle file, to summarise, the problem reasons can be:
- The file is somehow corrupted.
- The file pointer does not point to the start of a file, most likely can be fixed by
f.seek(0)
, see _pickle.UnpicklingError: could not find MARK
...
Found another cause? Feel free to edit the answer.