I have a piece of code that is running a search in a database and based on the results generates a new search. It does this by generating a list with the new search terms. Since the code can run for quite a long time I am trying to back this list (along with a few other lists containing other important values) up with cPickle to recover the search in case of crashes.
The lists themselves can contain in the million of entries (each entry is a short string of a few numbers) and thus can get very large.
Should my code crash I can then recover the list when restarting it and carry on with my search as if nothing had happened. My problem, however, is that if the list is too large (or a different process on my machine uses a lot of RAM) I end up receiving a MemoryError.
The code I have written to ensure this doesn't corrupt my previous backup is as follows:
import os, time
import cPickle as pickle
def pickle_securely(value_pairs, exception_log, current_path):
os.chdir(current_path)
try:
# pickle variables
for value in value_pairs:
filename = value[1]
var_name = value[0]
#remove file extension
filename_body = filename.split('.')[0]
filename_temp = filename_body + '.temp'
with open(filename_temp, 'wb') as filename_temp_pkl:
pickle.dump(var_name, filename_temp_pkl, -1)
# once all variables have been pickled successfully, replace the old files
for value in value_pairs:
filename = value[1]
var_name = value[0]
#remove file extension
filename_body = filename.split('.')[0]
filename_temp = filename_body + '.temp'
if os.path.isfile(filename) == True:
os.remove(filename)
os.rename(filename_temp, filename)
except Exception as e:
s = traceback.format_exc()
serr = "%s" % (s)
error_message = "Pickling " + str(var_name) + " into " + str(backup_name) + " has failed."
error_message = "\n" + time.strftime("%d/%m/%Y %H:%M:%S") + "\n" + error_message + serr +"\n" + "\n"
print error_message
exception_log.write(error_message)
raise
The way I have currently written it I thus pass a list containing pairs of the variable and the name of the file it is to be pickled to to the function (I obviously don't want to hardcode how many variables I will have as this might change depending on where I use this function which is why I use a list as outer structure).
Seeing as my lists (with the search results) however are very large I would prefer passing a reference to those lists to the function rather than a list containing copies of all those lists to keep my memory usage low.
So far I haven't found a good way of doing this yet, though, admittedly, perhaps my approach is not the most elegant. Could somebody perhaps point me in the right direction? (Or suggest a way how I can re-write the function so I don't need it?)
EDIT: after a lot of googling I found this article Python: Get a pointer to a list element stating that Python had no pointers. All examples that I have been able to find though seem to concern themselves with using the mutability of some objects to edit them in some sub-routine. I however don't care about that but merely want to keep my RAM usage low. Is there some other way?