8

Here is my code in Python:

parser = argparse.ArgumentParser()
parser.add_argument('--use_model', type=str, help='model location', required=True)
parser.add_argument('--model_dim', type=int, help='model dimension of words', required=True)
args = parser.parse_args()
f = open(args.use_model, "rb")
f.seek(0)
trained_model = pickle.load(f)

I get _pickle.UnpicklingError: could not find MARK error on last line.

In this question:

_pickle.UnpicklingError: could not find MARK

it says that f.seek(0) solves the problem, but in my case it didn't.

djvg
  • 11,722
  • 5
  • 72
  • 103
Alireza Pir
  • 878
  • 1
  • 16
  • 41

3 Answers3

0

Take a look at the source code of the underlining _pickle module (written in C), there is only one place where such error can appear:

static Py_ssize_t
marker(UnpicklerObject *self)
{
    Py_ssize_t mark;

    if (self->num_marks < 1) {
        PickleState *st = _Pickle_GetGlobalState();
        PyErr_SetString(st->UnpicklingError, "could not find MARK");
        return -1;
    }
...
}

Underhood, the pickle module uses a mark stack for unpickling container objects, and num_marks indicates how many objects in it. By initialization of Unpickler function, num_marks is set to 0 and then incremented when a new mark pushed onto the mark stack.

One of the possible ways to get this error is to change FRAME opcode (the value that indicates the start of a frame) or MARK opcode. Let's assume that we use protocol 4 (starting from this version pickle features binary framing), you can read more in pep 3154. The idea is simple and straightforward - split all the content into chunks (frames) and mark the boundaries of every frame. Let's dive into it.

Consider this example:

import pickle
data = {"fruits": ["apple", "banana", "pineapple"] }

with open("data.pickle", 'wb') as f:
    # Pickle the 'data' dictionary using the highest protocol available.
    pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

Let's investigate a bit with pickletools:

python -m pickletools data.pickle
    0: \x80 PROTO      5
    2: \x95 FRAME      46
   11: }    EMPTY_DICT
   12: \x94 MEMOIZE    (as 0)
   13: \x8c SHORT_BINUNICODE 'fruits'
   21: \x94 MEMOIZE    (as 1)
   22: ]    EMPTY_LIST
   23: \x94 MEMOIZE    (as 2)
   24: (    MARK
   25: \x8c     SHORT_BINUNICODE 'apple'
   32: \x94     MEMOIZE    (as 3)
   33: \x8c     SHORT_BINUNICODE 'banana'
   41: \x94     MEMOIZE    (as 4)
   42: \x8c     SHORT_BINUNICODE 'pineapple'
   53: \x94     MEMOIZE    (as 5)
   54: e        APPENDS    (MARK at 24)
   55: s    SETITEM
   56: .    STOP
the highest protocol among opcodes = 4

Here FRAME (0x95) indicates the start position of a new frame and MARK ('(') the start of the container object, these values are specified by protocol (this is an implementation detail).

I would first take a look at the content of the pickle file, to summarise, the problem reasons can be:

  1. The file is somehow corrupted.
  2. The file pointer does not point to the start of a file, most likely can be fixed by f.seek(0), see _pickle.UnpicklingError: could not find MARK ...

Found another cause? Feel free to edit the answer.

funnydman
  • 9,083
  • 4
  • 40
  • 55
0

I recently had the same error. A code which worked flawlessly has suddenly decided to throw this error.

A file corruption was the cause: CRLF has been replaced by LF by git.

I re-downloaded the file and fixed the problem.

roborg
  • 252
  • 3
  • 12
0

I had the same issue, and it was caused by using pandas read_pickle() on a file that was actually not a pickle file...

Peurke
  • 189
  • 3
  • 7