2

I'm having a strange issue where Python will successfully find and read a binary file that exists, but pickle.load() will not. pickle.load() is throwing a FileNotFoundError which doesn't make much sense. I know for a fact the file is there because if I try to read the contents of the file I'm able to.

try:
    with open("test", "rb") as f:
        print(f.read())
        data = pickle.load(f)

except FileNotFoundError as e:
    print(e)

I've been trying to wrap my head around this for a few hours now and I just can't understand what's going on here. I've had my fair share of Python and never had this happen to me. Working on Windows 10 with VSCode and WSL (Ubuntu 20.04).

EDIT: I know this particular code won't work because I'm reading with f.read() first. I just put it there to show that it works, I only really want to pickle.load() it.

EDIT: Traceback goes like such:

Traceback (most recent call last):
  File "/mnt/d/_/Projects/FCUL/SO/pgrepwc/v2/hpgrepwc.py", line 34, in main
    data = pickle.load(f)
  File "/usr/lib/python3.8/multiprocessing/managers.py", line 959, in RebuildProxy
    return func(token, serializer, incref=incref, **kwds)
  File "/usr/lib/python3.8/multiprocessing/managers.py", line 809, in __init__
    self._incref()
  File "/usr/lib/python3.8/multiprocessing/managers.py", line 863, in _incref
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 630, in SocketClient
    s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

And as requested, my directory listing:

prgrepwc:
    |
    |   histFile1
    |   histFile2
    |   .gitattributes
    |   .gitignore
    |   testFile
    |
    +---.vscode
    |       launch.json
    |       settings.json
    |
    |
    \---v2
        |
        |   testFile
        |   histFile
        |   hpgrepwc.py
        |   Load.py
        |   Match.py
        |   pgrepwc_v2.py
        \-- README.txt

The file I'm executing is hpgrepwc.py in folder v2. The file I'm trying to read is the binary file testFile. I've noticed even though my script is in folder v2, it defaults to pgrepwc sometimes so I even placed a copy of testFile on there just in case. No dice either way, I've also tried to save the file as .bin to no avail.

SOLUTION:

@tdelaney mentioned:

"(...) it looks like some object created in a multiprocessing.Manager was pickled. But these objects are actually proxies that broadcast changes to a group of subprocesses and are not valid outside of that context. In your case, the unpickler tried to reconstruct a class that tried to reconnect to its long-dead multiprocessing partners. You need to look to the code doing the pickling and figure out some other way to encapsulate the data."

This was exactly it. I make heavy use of multiprocessing.Manager for shared memory data structures in my code. After converting a manager.dict() to a regular Python dict, the pickling and unpickling worked like a charm. Once again, thanks to everyone who contributed and especially @tdelaney.

zeval
  • 147
  • 1
  • 10
  • Can you post the full traceback message? – tdelaney Dec 05 '20 at 17:22
  • The file you use should be of the extension `.pickle`. So re-dump the pickle under the name of `test.pickle` then open `test.pickle`. – Navaneeth Reddy Dec 05 '20 at 17:26
  • @NavaneethReddy - pickle doesn't know or care about the file name. Its an open file, pickle reads it. – tdelaney Dec 05 '20 at 17:26
  • 2
    This code won't work because you've read to end of file before trying to unpickle. That should result in an EOFError, not FileNotFoundError. If the error is on the `open` call, then the problem is just that there isn't a file called "test" in the current working directory. I asked for the traceback so that we can see both the full error message and the failing line. – tdelaney Dec 05 '20 at 17:28
  • @tdelaney if you have a file without any extension, that is considered a folder. That's why programmers choose the extension `.pickle` by convention. – Navaneeth Reddy Dec 05 '20 at 17:29
  • Are you sure the exception being thrown is `FileNotFoundError` and not `EOFError`? Trying to replicate what you did simply gives me an `EOFError` because I assume that the call to `f.read()` moves the file pointer to EOF. Removing the call to `f.read()` was enough to get it working. – Dakshraj Sharma Dec 05 '20 at 17:30
  • 1
    @NavaneethReddy -That is not correct. Files do not need extensions on them. Microsoft Windows lets you associate programs to files via the file extension and will not execute code unless it has an `.exe` or other well-defined extension. Unix-like systems don't work that way and don't usually care what the exention (if any) is. In this case, OP opened the file and passed the file handle to pickle. Pickle just reads that file and doesn't care in the least where it came from. – tdelaney Dec 05 '20 at 17:31
  • @DakshrajSharma This code will not throw an EOF error since `open('file.name', 'rb') as f` doesn't actually read the file. – Navaneeth Reddy Dec 05 '20 at 17:32
  • @DakshrajSharma Also `.read()` is a blocking function and no buffer is specified. So the program will not resume till `.read()` is complete. – Navaneeth Reddy Dec 05 '20 at 17:34
  • @DakshrajSharma As I'm only excepting that error, I'm sure it's FileNotFoundError. @tdelaney I would but it actually doesn't give me any, just `"Errno 2] No such file or directory: testFile"` – zeval Dec 05 '20 at 17:34
  • @NavaneethReddy, you're right it doesn't. But the very next line `print(f.read())` does. – Dakshraj Sharma Dec 05 '20 at 17:35
  • 1
    @zeval - but the read worked? You could add `import traceback;traceback.print_exc()` to your exception handler to print the traceback so that we can see where the program failed. – tdelaney Dec 05 '20 at 17:36
  • @Zeval a listing of your directory contents and some more code context could be helpful! – Dakshraj Sharma Dec 05 '20 at 17:36
  • @Zeval I have just tested `open` and I am right, you absolutely need to specify the file extension along with the name. You can only exclude file extensions when the file itself doesn't have an extension. – Navaneeth Reddy Dec 05 '20 at 17:39
  • The read works, that's what's weird. I'll try to get the traceback and report back. – zeval Dec 05 '20 at 17:39
  • Pickle needs to import modules and create class instances during load. Its possible that you have a class that attempts to open a file in that process. That would fail if the file didn't exist in the unpickling environment and basically means that the class is unpickleable. We would see the error deep in the bowels of pickle in the traceback in that case. – tdelaney Dec 05 '20 at 17:40
  • @Zeval Well the function doesn't actually get to `.read()` anything since the `open` line above is throwing a `FileNotFound` error. – Navaneeth Reddy Dec 05 '20 at 17:45
  • @Zeval So technically the read is not reachable but will work once the file name is fixed. – Navaneeth Reddy Dec 05 '20 at 17:45
  • 1
    @Zeval are you using multiprocessing/processes in your code? Could this be relevant: https://stackoverflow.com/questions/56641428/python-3-6-nested-multiprocessing-managers-cause-filenotfounderror. The error seems somewhat relevant to the one described in this question – Dakshraj Sharma Dec 05 '20 at 18:00
  • 3
    With that edit, it looks like some object created in a `multiprocessing.Manager` was pickled. But these objects are actually proxies that broadcast changes to a group of subprocesses and are not valid outside of that context. In your case, the unpickler tried to reconstruct a class that tried to reconnect to its long-dead multiprocessing partners. You need to look to the code doing the pickling and figure out some other way to encapsulate the data. – tdelaney Dec 05 '20 at 18:00
  • @tdelaney Yes, that might be the problem, too. – Navaneeth Reddy Dec 05 '20 at 18:13
  • 1
    @tdelaney Thank you! That makes perfect sense!! I make heavy use of `multiprocessing.Manager` on my main program, to make a dictionary available to all subprocesses, which works wonderfully fast and a lot better than all the other alternatives I've tried, it's absolutely invaluable to the code. What I could do is try to convert the `multiprocessing.Manager().dict()` to a regular dictionary. I'll try that, thank you for pointing me in the right direction. – zeval Dec 05 '20 at 18:16
  • From the traceback we see `File "/usr/lib/python3.8/multiprocessing/managers.py", line 959, in RebuildProxy` - so a multiprocessing manager was part of what was pickled. The problem is on the side that did `pickle.dump`. It pickled an object that can't be unpickled. We need to look at that code to figure this out. – tdelaney Dec 05 '20 at 18:16
  • Great to hear. I think that is the solution. – tdelaney Dec 05 '20 at 18:17

2 Answers2

1

Make sure you do not have multiprocessing.Manager objects in your class or object. I had a multiprocessing.Manager().dict() in my object. So I replaced that with a normal dict and everything works fine now.

Rick Vink
  • 321
  • 1
  • 3
  • 11
-1

So pickling a file with an extension of .pkl or .pickle is recommended.

So re-dump the file with an extension first.

with open('test.pickle', 'wb') as f:
    pickle.dump(your_contents, f)

then change the loading part to this:

try:
    with open("test.pickle", "rb") as f:
        print(f.read())
        data = pickle.load(f)

except FileNotFoundError as e:
    print(e)

The python function open will read files with extensions. You can only exclude extensions only if the file itself doesn't have an extension.

Navaneeth Reddy
  • 315
  • 1
  • 12
  • 1
    No, pickle doesn't care what the file name is. It could be `ThisIsNotAPickleFile.TrustMe` and would read just fine. `pickle.load` doesn't actually open files, it just requires an object with a couple of methods. From the docs _The argument *file* must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments._. You can use a reader that pulls in web content, a reader that decompresses first, and etc... Pickle does not know the file name and has no interaction with the file name in any way. – tdelaney Dec 05 '20 at 17:52
  • Sorry, this didn't work for me. I tried `.bin`, `.pickle`, `.pkl`, none of them worked, but then again it's UNIX so that wouldn't be the problem. – zeval Dec 05 '20 at 17:56
  • @tdelaney That's what I was telling you in the previous comment section. Pickle doesn't care about the file extension, but `.pickle` extension is used just as per convention. Also on windows, `open` will only read files if extension is specified. – Navaneeth Reddy Dec 05 '20 at 17:57
  • 1
    @Zeval If you are on unix, try cPickle instead of pickle. – Navaneeth Reddy Dec 05 '20 at 18:00
  • @tdelaney Yes true, pickle doesn't care about the filename, but you need to put a reference of that file inside the load, how else you'd do that, by providing a full file name with the extension to the `open` function. – Navaneeth Reddy Dec 05 '20 at 18:01
  • @NavaneethReddy I've used pickle in other projects before, and right now I'm using custom objects inside the pickled object which I'm not sure cPickle supports. – zeval Dec 05 '20 at 18:03
  • @tdelaney I never said `pickle` cares and doesn't care about the file name at the same time. Yes, the file name can be anything., but to actually `load` it, you have to have a reference to the object in the second parameter of `load` and that parameter has to be an object of `open`. Try this code first then we can discuss the errors. – Navaneeth Reddy Dec 05 '20 at 18:06
  • 1
    @Zeval I am not sure neither, but I have seen unix users using `cPickle` instead of `pickle`. I use `pickle` since I'm a windows user. But we are programmers, we just try stuff until it tends to work right. – Navaneeth Reddy Dec 05 '20 at 18:08
  • @NavaneethReddy True that. And I thank you for the suggestion, but I'm obliged to either use pickle or struct for this task, and I'd much rather go with pickle. – zeval Dec 05 '20 at 18:09
  • @tdelaney To be fair, I don't really understand what's going on from the traceback, sorry. – zeval Dec 05 '20 at 18:10
  • @Zeval - lets discuss that above. – tdelaney Dec 05 '20 at 18:11