0

my basic code in Python iterates through 2,000,000 combinations of parameters which are passed to a third-party fortran exe. Fortran modelles the desired output and creates a .csv file of size 54KB each.

At the beginning, I get like 100 outputs per second - at this speed, the calculation should be done in 20,000sec or 5.5h (which is great!). Unfortunately, the process steadily becomes slower with each iteration. After 10,000 outputs (0.5%) I am already down at 20 outputs per second. After model run #30,000 it is 2 outputs per second and when I looked at my computer one day later, I was at run #60,000 with less than 1 run per second - needless to say that I had to abort the process.

Now I am wondering what causes such a heavy decrease in calculation speed. After shutting down the computer and re-starting the process where I last killed it, it now runs as fast as usual - so I assume it's got to do with some sort of memory overflow.

Question is: can I somehow "flush" the memory without deleting information within the script (i.e. my matrix of parameter combinations and other variables) or should I rather manually restart the computer after 30,000 model runs? I could also think of a batch file for that, but I'm not good at coding them, so I'd rather look for a proper way to do all this in Python.

Any hints? Thanks!

Edit: alright, here is the core code for the iteration. Before that, a matrix of size 10 x 2,000,000 is created as a look up table. This matrix is unchanged over time!

for run in range(n_total): # n_total is 2,000,000

    # build list for passing the parameters ("para_pro")
    para_pro = ['']*(13)
    para_pro[0:n_para] = para_grid[:,run]

    # add some stuff
    para_pro[-3] = SZA
    para_pro[-2] = OZA
    para_pro[-1] = rAA

    # conversion to string and add e-o-l charakcter
    para_pro = [str(para_pro[i])+'\n' for i in xrange(13)]

    # write to parameter-file
    with open(model_dir + parameter_file, 'w') as PFile:
        for line in para_pro:
            PFile.writelines(line)

    # generate file name (some internal labelling of the file which later tells me which combination I am looking at)
    filename = "run" + str(run+1) + "ID" + final_namelist[run]

    # create String for Batch-File
    batch = ['cd %s\n' % model_dir, 'D:\n', '(\n', 'echo %s\n' % filename, 'echo 2\n', 'echo 2\n', 'echo 2\n', 'echo 1\n', ') | fortran_model.exe\n']

    print "ID%i of %i" % (int(run+1), n_total)

    batch_id = run % 50 # creates numbers from 0 to 49 to cycle the used Batch-file if it is still open for read in the model

    # Write batch file
    with open(model_dir + "Batch_py%i.bat" % batch_id, 'w') as BFile:
        for line in batch:
            BFile.writelines(line)

    # Execute Batch -> execution of the model

    p = Popen(model_dir + "Batch_py%i.bat" % batch_id)
    stderr, stdout = p.communicate()
offeltoffel
  • 2,691
  • 2
  • 21
  • 35
  • 2
    This is impossible to answer without an analysis of your code; you may well have a memory leak somehere (holding on to Python objects where you don't need them anymore) but there is no easy one-size-fits-all "here is how you clear memory" recipe to provide. – Martijn Pieters Aug 16 '16 at 08:18
  • You may want [profile memory usage](http://stackoverflow.com/questions/552744/how-do-i-profile-memory-usage-in-python) as a starting point, then ask a more specific question with code. – Martijn Pieters Aug 16 '16 at 08:19
  • Hi, it is hard to say anything without knowing more about the code that runs etc, but I'd suggest to first look for bottlenecks within your code, for example, is the process doing something that can pile up (access to files, database or anything else)? After that, maybe try to change the iteration of the combinations of the parameters to use a generator that will hold the memory amount steady. – mkaran Aug 16 '16 at 08:22
  • In the python code, I keep re-using the same lists. Before the iteration starts, I create a huge matrix - some sort of look up table - but that only takes about 20 seconds and then it's there unchanged. Maybe the problem is at the fortran code which I cannot access unfortunately... – offeltoffel Aug 16 '16 at 08:30
  • @mkaran: how can an access to files pile up? I do access files, yes. I overwrite a very short text-file (which contains the information for the model), I overwrite a very short batch-file and I run the batch-file. Then I start over again. Could that lead to problems? – offeltoffel Aug 16 '16 at 08:59
  • at all: I am sorry that I wasted your time. I found the problem and it's really embarrassing... it's not the code, it's windows. The more files in one folder the slower new files are created. I have to flush the directory and move them to another folder, then the process runs as quickly as before. – offeltoffel Aug 16 '16 at 09:23

0 Answers0