0

I have a piece of code that is running a search in a database and based on the results generates a new search. It does this by generating a list with the new search terms. Since the code can run for quite a long time I am trying to back this list (along with a few other lists containing other important values) up with cPickle to recover the search in case of crashes.

The lists themselves can contain in the million of entries (each entry is a short string of a few numbers) and thus can get very large.

Should my code crash I can then recover the list when restarting it and carry on with my search as if nothing had happened. My problem, however, is that if the list is too large (or a different process on my machine uses a lot of RAM) I end up receiving a MemoryError.

The code I have written to ensure this doesn't corrupt my previous backup is as follows:

import os, time
import cPickle as pickle

def pickle_securely(value_pairs, exception_log, current_path):
    os.chdir(current_path)
    try:
        # pickle variables
        for value in value_pairs:
            filename = value[1] 
            var_name = value[0]
            #remove file extension
            filename_body = filename.split('.')[0]
            filename_temp = filename_body + '.temp'
            with open(filename_temp, 'wb') as filename_temp_pkl:
                pickle.dump(var_name, filename_temp_pkl, -1)

        # once all variables have been pickled successfully, replace the old files    
        for value in value_pairs:
            filename = value[1] 
            var_name = value[0]
            #remove file extension
            filename_body = filename.split('.')[0]
            filename_temp = filename_body + '.temp'
        if os.path.isfile(filename) == True:
            os.remove(filename)
        os.rename(filename_temp, filename)
    except Exception as e:
        s = traceback.format_exc()
        serr = "%s" % (s)
        error_message = "Pickling " + str(var_name) + " into " + str(backup_name) + " has failed."
        error_message = "\n" + time.strftime("%d/%m/%Y %H:%M:%S") + "\n" + error_message + serr +"\n" + "\n"
        print error_message
        exception_log.write(error_message)
        raise

The way I have currently written it I thus pass a list containing pairs of the variable and the name of the file it is to be pickled to to the function (I obviously don't want to hardcode how many variables I will have as this might change depending on where I use this function which is why I use a list as outer structure).

Seeing as my lists (with the search results) however are very large I would prefer passing a reference to those lists to the function rather than a list containing copies of all those lists to keep my memory usage low.

So far I haven't found a good way of doing this yet, though, admittedly, perhaps my approach is not the most elegant. Could somebody perhaps point me in the right direction? (Or suggest a way how I can re-write the function so I don't need it?)

EDIT: after a lot of googling I found this article Python: Get a pointer to a list element stating that Python had no pointers. All examples that I have been able to find though seem to concern themselves with using the mutability of some objects to edit them in some sub-routine. I however don't care about that but merely want to keep my RAM usage low. Is there some other way?

Community
  • 1
  • 1
P-M
  • 1,279
  • 2
  • 21
  • 35
  • 1
    `list` as arguments are passed-by-assignment. In fact a reference to an object. So as far as I know, there is no data copy. – luoluo Sep 11 '15 at 14:20
  • What though if I pass that list wrapped in an outer list? Would I not create a new variable (and thus data) at that point? By wrapping it in a list I can pass, say, five lists to my function for pickling without having to hard code that I am expecting five lists as I can simply iterate through it. – P-M Sep 11 '15 at 14:26
  • 1
    Creating a new variable who's value is a list is creating a `reference` to an list object. – luoluo Sep 11 '15 at 14:29

1 Answers1

0

Having tried what @luoluo mentioned in his comment in IPython, it is true that wrapping several lists in another list does not create a new variable but seemingly rather just a reference as indicated by the fact that while a = range(10**7) and b = range(10**7) both cause an increase in memory use, subsequently calling c = [a,b] causes no further increase in memory usage.

Thus, to answer the question, wrapping lists in a list to pass them to a function passes just the references to the lists and not copies of the actual lists.

P-M
  • 1,279
  • 2
  • 21
  • 35