6

Not able to load a pickle file. I am using python 3.5

import pickle
data=pickle.load(open("D:\\ud120-projects\\final_project\\final_project_dataset.pkl", "r"))

TypeError: a bytes-like object is required, not 'str'

. .

Also tried:

import pickle
data=pickle.load(open("D:\\ud120-projects\\final_project\\final_project_dataset.pkl", "rb"))

UnpicklingError: the STRING opcode argument must be quoted

. .

Same errors even when using with statements

import pickle
with open("D:\\ud120-projects\\final_project\\final_project_dataset.pkl", "rb") as f:
    enron_data = pickle.load(f)
Nimish Bansal
  • 1,719
  • 4
  • 20
  • 37

4 Answers4

13

I'm using windows 10 and vscode, you should go to the final_project_dataset.pkl file and then change option CRLF to LF and then save the file then UnpicklingError: the STRING opcode argument must be quoted error will be disappeared.

enter image description here

change CRLF to LF

enter image description here

then save the final_project_dataset.pkl file.

ehsan maddahi
  • 439
  • 5
  • 10
3

You definitely need the "rb" to read the file, which solves the first problem.

The second issue (STRING opcode argument) is because the file doesn't have Unix line endings. You need to run the pkl file through a script to convert them. If you see this thread, there is a script called "dos2unix" that will solve that for you:

How to convert DOS/Windows newline (CRLF) to Unix newline (\n) in a Bash script?

Dale
  • 534
  • 4
  • 13
  • 1
    Are you using this dataset? https://github.com/udacity/ud120-projects/tree/master/final_project – Dale Jul 31 '17 at 16:07
  • I copied and pasted the bottom version (with open...) and it worked perfectly in both Python 2.7 and 3.6 on a Mac. Are you doing something besides the code above? – Dale Jul 31 '17 at 16:15
  • I am using https://github.com/udacity/ud120-projects/blob/master/tools/email_authors.pkl slightly different from the one you mentioned and I am working on windows.with python 3.5 – Nimish Bansal Jul 31 '17 at 17:34
  • 1
    Maybe it's a Windows issue. It works perfectly on OS X for me. – Dale Aug 01 '17 at 23:59
1

If an entire script puts you off, it really just deserves one line:

import pickle
with open("D:\\path\\to\\file.data", "rb") as f:
    lines = [line.rstrip("\r\n") for line in f.readlines()]
    data = pickle.loads("\n".join(lines))
ToonAlfrink
  • 2,501
  • 2
  • 19
  • 19
  • I really like how concise this is, but the list comprehension line throws `TypeError: a bytes-like object is required, not 'str'` when I try this. I'm using `'rb'` in `open()`. Any suggestions? I'm using Python 3.8.5, Windows 10, Jupyter Notebook. – Kaleb Coberly Feb 07 '21 at 20:08
  • The 'readlines()' output is a list of binary outputs (e.g. `[b'(lp0\r\n', b"S' sbaile2'\r\n"]`, and `rstrip()` seems to be unable to handle it; `f.readlines()[0].rstrip('\r\n')` throws the same error. Is there a concise way to convert the binary items to strings then convert them back to binary after replacing their endlines? – Kaleb Coberly Feb 07 '21 at 20:46
0

The only Fix for me was(Answered by Monkshow92 in Github) :

" The pickle file has to be using Unix new lines otherwise at least Python 3.4's C pickle parser fails with exception: pickle.UnpicklingError: the STRING opcode argument must be quoted I think that some git versions may be changing the Unix new lines ('\n') to DOS lines ('\r\n').

You may use this code to change "word_data.pkl" to "word_data_unix.pkl" and then use the new .pkl file on the script "nb_author_id.py": dos2unix.txt

#!/usr/bin/env python
"""
convert dos linefeeds (crlf) to unix (lf)
usage: dos2unix.py 
"""
original = "word_data.pkl"
destination = "word_data_unix.pkl"

content = ''
outsize = 0
with open(original, 'rb') as infile:
    content = infile.read()
with open(destination, 'wb') as output:
    for line in content.splitlines():
        outsize += len(line) + 1
        output.write(line + str.encode('\n'))

print("Done. Saved %s bytes." % (len(content)-outsize))

dos2unix.py addapted from: http://stackoverflow.com/a/19702943

Small Tweak that I have found is , Changing "r" mode to "rb" byte object mode. And finally converting all the .pkl files using above python script to convert from Dos to Unix !

Answer Link : https://github.com/udacity/ud120-projects/issues/46 Full Credit : Monkshow92

Adam
  • 3
  • 1
  • 1
RoshanADK
  • 175
  • 2
  • 6