I got a python BSDDB database. Obviously, it's stored on the hard drive. When I remove some entries, the file on the drive does not get any smaller (consequently - it grows quite fast...)
utDelList = []
urlsDelList = []
for ut in iter(self.urls2task):
tmp = string.split(ut, ":")
uid = tmp[1]
url = cPickle.loads(self.urls[int(uid)])
urlsDelList.append(uid)
utDelList.append(ut)
del self.urlsDepth[uid]
del self.urlsStatus[uid]
del url
for ut in utDelList:
del self.urls2task[ut]
for uid in urlsDelList:
del self.urls[int(uid)]
(...)
#synchronize all files
self.sync()
My last hope was to force the flush in a savage way - by closing and opening the files again
#close all files & start them again, eg
self.tasks.close()
self.urls2task.close()
self.tasks = bsddb.rnopen(filepath)
self.urls2task = bsddb.hashopen
the crucial element here is the self.tasks entry; it grows the fastest and biggest of all files. Does pickling-save change anyhow the way of removing it? And, once again - why do the files still keep the entries after removing them? Id be grateful 4any suggestions (first post here :))