3

I have a data file saved using the shelve module in python 2.7 which is somehow corrupt. I can load it with db = shelve.open('file.db') but when I call len(db) or even bool(db) it hangs, and I have to kill the process.

However, I am able to loop through the entire thing and create a new non-corrupt file:

db = shelve.open('orig.db')
db2 = shelve.open('copy.db')
for k, v in db.items():
    db2[k] = v
db2.close() # copy.db will now be a fully working copy

The question is, how can I test the dict and avoid the hang?

BTW, I still have the original file, and it exhibits the same behaviour when copied to other machines, in case someone also wants to help me get to the bottom of what's actually wrong with the file in the first place!

crazystick
  • 590
  • 5
  • 14
  • Not sure on the inspection, maybe try opening with some different protocols http://stackoverflow.com/questions/23582489/python-pickle-protocol-choice also do it in a subprocess that you can time out – brennan Apr 14 '17 at 11:16

1 Answers1

1

I'm unaware of any inspection methods other than dbm.whichdb(). For debugging a possible pickle protocol mismatch in a manner that allows you to timeout long running tests maybe try:

import shelve
import pickle
import dbm
import multiprocessing
import time
import psutil

def protocol_check():
    print('orig.db is', dbm.whichdb('orig.db'))
    print('copy.db is', dbm.whichdb('copy.db'))
    for p in range(pickle.HIGHEST_PROTOCOL + 1):
        print('trying protocol', p)
        db = shelve.open('orig.db', protocol=p)
        db2 = shelve.open('copy.db')
        try:
            for k, v in db.items():
                db2[k] = v
        finally:
            db2.close()
            db.close()
        print('great success on', p)

def terminate(grace_period=2):
    procs = psutil.Process().children()
    for p in procs:
        p.terminate()
    gone, still_alive = psutil.wait_procs(procs, timeout=grace_period)
    for p in still_alive:
        p.kill()

process = multiprocessing.Process(target=protocol_check)
process.start()
time.sleep(10)
terminate()
Community
  • 1
  • 1
brennan
  • 3,392
  • 24
  • 42
  • unfortunately whichdb() returns 'dbhash' for both the working and corrupt dbs. also, whichdb is in whichdb module for python2.7. I like the idea about the timeout though. – crazystick Apr 20 '17 at 12:12