0

I scraped a large amount of data from a database and saved it as "first_database.db" using Python's shelve module (I'm using Python 3.4). I've had problems with shelve before (see my old issues), which IIRC were probably due to something relating to my ancient OS (OSX 10.9.4) and gdbm/dbm.gnu.

Now I have now a more intractable problem: I made a new file that's ~170 MB, and now I can only access a single key/value, no matter what.

I know the superset of possible keys, and trying to access any of them gives me a KeyError (except for one). When I save the value of the single key that doesn't return a KeyError as a new shelve database, its size is only 16 KB, so I know the data is in the 170 MB file, but I can't access it.

Am I just screwed?

Furthermore, I have made a copy of the database and tried to add more keys to it (~95). That database will say that it has three keys, but when I try to access the value of the third one, I get the following error:

File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/shelve.py", line 114, in __getitem__ value = Unpickler(f).load() _pickle.UnpicklingError: invalid load key, ''.

Zeke
  • 617
  • 1
  • 6
  • 15

1 Answers1

0

I don't know the issue, but maybe this alternative might help you:

https://github.com/dagnelies/pysos

It's like shelve but does not rely on an underlying implementation and saves its data in plain text. That way, you could even open the DB file to inspect its content if something unexpected occurs.

Note also that shelve relies on an underlying dbm implementation. That means that if you saved your shelve on a Linux, you might not be able to read it on Mac for instance, if its dbm implementation differs (there are several of).

dagnelies
  • 5,203
  • 5
  • 38
  • 56
  • Thanks. I ended up just using JSON, which actually made things way easier for other reasons as well – Zeke Oct 24 '18 at 14:35