How to make a computation loop easily splittable and resumable?

Question

I want to find optimal parameters i, j, k in 0..99 for a given computational problem, and I need to run:

for i in range(100):
    for j in range(100):
        for k in range(100):
            dothejob(i, j, k)    # 1 second per computation

This takes a total of 10^6 seconds, i.e. 11.5 days.

I started doing it by splitting the work among 4 processes (to use 100% computing power of my 4-core CPU computer):

for i in range(100):
    if i % 4 != 0:      #  replace != 0 by 1, 2, or 3 for the parallel scripts #2, #3, #4
        continue
    for j in range(100):
        for k in range(100):
            dothejob(i, j, k)

        with open('done.log', 'a+') as f:    # log what has been done
            f.write("%i %i\n" % (i, j))

But I have problems with this approach:

I have to run python script.py, then open script.py, replace line 2 by if i % 4 != 1, then run python script.py, then open script.py, replace line 2 by if i % 4 != 2, then run python script.py, then open script.py, replace line 2 by if i % 4 != 3, then run python script.py.
Let's say the loop is interrupted (need to reboot computer, or crash or anything else, etc.). At least we know all the (i, j) already done in done.log (so we don't need to start from 0 again), but there's no easy way to resume the work. (OK we can open done.log, parse it, discard the (i, j) already done when restarting the loops, I started doing this - but I had the feeling to reinvent, in a dirty way, something already existing)

I'm looking for a better solution for this (but for example map/reduce might be an overkill for this little task, and not easy to use in a few lines in Python).

Question: How to make a computation for i in range(100): for j in range(100): for k in range(100): dothejob(i, j, k) easily splittable among multiple processes and easily resumable (e.g. after reboot) in Python?

Without reading all of the question: Are there results that can be cached and reused at some point? — rocksteady, Feb 02 '18 at 09:54
@rocksteady: No. Each computation with a new i, j, k is a totally new task, nothing can be reused from past work. — Basj, Feb 02 '18 at 09:55
Can't you just use a parallel for loop like here? https://pythonhosted.org/joblib/parallel.html — CodeZero, Feb 02 '18 at 09:58
@datasailor Would it log the i, j, k already done so that it's resumable later (after a reboot) without restarting from 0? — Basj, Feb 02 '18 at 09:59
Use a library like joblib as suggested by @datasailor or multiprocessing with the `concurrent.futures` package instead of having to run 4 scripts. — Ignacio Vergara Kausel, Feb 02 '18 at 09:59
About the restartability, I'd say the parsing-the-file-and-skip-what's-done approach is IMO your best choice. AFAIK there's nothing that can natively pick up from where it started after a reboot — GPhilo, Feb 02 '18 at 10:00
@rocksteady Let's say `dothejob(i, j, k)` tries to find the least real rumber solution to a complex equation in which i, j, k are parameters. Let's say `dothejob(i, j, k)` doesn't help at all to compute `dothejob(i', j', k')` if (i, j, k) different than (i', j', k'). — Basj, Feb 02 '18 at 10:01
@GPhilo what about having a function that yields the result and pickling the whole thing every now and then? Although I agree that storing and parsing would be my approach too. — Ignacio Vergara Kausel, Feb 02 '18 at 10:01
@GPhilo I thought there's probably a high-level Python module `job` that takes care of the "split among processes" / log what is done / pause / resume / etc. — Basj, Feb 02 '18 at 10:03
@IgnacioVergaraKausel yes I was starting to write such code with storing and parsing, but then I felt like "Python probably has a (native or 3rd party) high-level tool for this". — Basj, Feb 02 '18 at 10:05
Not sure it would cover all your "needs" but maybe you can check `dask`. If you're afraid regarding power outages or crashes, just chunk your problem in smaller parts that you know they might finish without problems and if a problem occurs is not too expensive (time wise) to do it again. — Ignacio Vergara Kausel, Feb 02 '18 at 10:06
@Basj for the automatic split and task management, look at the libraries they mentioned above. That, however, still assumes there's a "manager" process that spawns tasks on workers. If your manager dies, or if the pc is rebooted, you'll have to look at loading the last known state from disk (which is exactly what the parsing would do...). — GPhilo, Feb 02 '18 at 10:13
If you require some sort of persistence that would survive reboots, the simplest I would start with is something like sqlite. That would make your loop more complicated, you'd have to check the database first to see if a particular computation has been performed. But this will already be better than parsing log files, plus faster if you do the database schema and query correctly. — bow, Feb 02 '18 at 10:18
Just want to add that brute-forcing the search of the optimal parameters should be your last resort. Even in cases where your "fitness landscape" is rather irregular, e.g., with many minima/maxima, approaches like evolutionary strategies or genetic algorithms would be smarter... requiring less evaluations. — Ignacio Vergara Kausel, Feb 02 '18 at 10:19
@IgnacioVergaraKausel Yes sure, of course it's better to make use of possible continuity or regularity / convexity / minima/maxima approaches, but sometimes in last resort we still need to do such things. — Basj, Feb 02 '18 at 10:21
@bow Yes exactly about SQLite, that's what I was starting to use, thus this question: [Share a dict with multiple Python scripts](https://stackoverflow.com/questions/48409610/share-a-dict-with-multiple-python-scripts) — Basj, Feb 02 '18 at 10:23
Evolutionary approaches don't assume continuity or regularity or convexity... and it's still better than trying every possibility. — Ignacio Vergara Kausel, Feb 02 '18 at 10:23
@IgnacioVergaraKausel I don't know if it would apply to my problem, but I'll look at it. Any good introduction reference about this computational topic? — Basj, Feb 02 '18 at 10:24
@Basj not sure if this is a good introduction, but the animations are very helpful http://blog.otoro.net/2017/10/29/visual-evolution-strategies/ I'm also not sure if this fits your problem, but it is a very general approach. There is a fancy python package `deap` that implements diverse evolutionary algorithms (but I haven't used it in any serious capability). — Ignacio Vergara Kausel, Feb 02 '18 at 10:28

Netwave · Accepted Answer · 2018-02-02T10:22:43.327

1

Just map the product using a pool of processes, example:

import itertools as it
from multiprocessing import Pool
the_args = it.product(range(100), range(100), range(100))
pool = Pool(4)

def jobWrapper(args): #we need this to unpack the (i, j, k) tuple 
    return dothejob(*args)

res = pool.map(jobWrapper, the_args)

If you want to resume it, knowing the las (i, j, k) from the log, just skip all previously computed from the_args:

the_args = it.product(range(100), range(100), range(100))
#skip previously computed 
while True:
    if next(the_args) == (i, j, k):
        break
...

Being (i, j, k) the tuple with the las computed values.

edited Feb 02 '18 at 10:22

answered Feb 02 '18 at 10:13

Netwave

40,134
6
50
93

Nice I think this totally answers #1 part! Being curious: would `it.product(range(1000), range(1000), range(1000))` create a list in memory with 1 billion items? – Basj Feb 02 '18 at 10:15
Would you have an idea about #2: how to pause / resume the computation? Or even stop and reboot computer and then resume? – Basj Feb 02 '18 at 10:17
you can pass the 3 last computed values related to i, j, k by argument to the script and consume the product iterator till you found it before start computing the heavy stuff – Netwave Feb 02 '18 at 10:18
1

@Basj, itertools creates an iterator so it will not load the memory with the items but compute them as needed – Netwave Feb 02 '18 at 10:20
I get this: `File "C:\Users\User\Desktop\test.py", line 14, in main res = pool.map(jobWrapper, the_args) File "C:\Python27\lib\multiprocessing\pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "C:\Python27\lib\multiprocessing\pool.py", line 558, in get raise self._value cPickle.PicklingError: Can't pickle : attribute lookup __builtin__.function failed` – Basj Feb 02 '18 at 10:34
@Basj not sure, but I guess the problem is that you're using python 2.7. Do yourself a favor, if you can, and use a more modern version ;) – Ignacio Vergara Kausel Feb 02 '18 at 10:39
@IgnacioVergaraKausel using both Windows and Python 2.7 makes it even more hazardous ;) – Basj Feb 02 '18 at 10:47
@Netwave your solution is nearly perfect, I'll just mention a small corner case: let's say `dothejob(50,22,11)` has just begun and the 2nd process is now beginning `dothejob(50,22,12)` but the latter is finished before the former and we interrupt here. Then the "last done item logged" will be `50,22,12`. Then when we resume, we'll restart at `50,22,13`, forgetting that `dothejob(50,22,11)` has never been finished completely (we should restart at `50,22,11`!). – Basj Feb 02 '18 at 11:05
1

@Basj, then just instead of writing the log at the start just log whenever the job is finished. – Netwave Feb 02 '18 at 11:19
@Basj, concerning the error, the function you pass to map should be pickable (no lambdas allowed), try to define in the module, and have a `if __name__ == "__main__"` for the main code – Netwave Feb 02 '18 at 11:21
@Netwave that's what I already do in the previous example. In other words dothejob(50,22,11) starts, dothejob(50,22,12) starts, dothejob(50,22,12) finished, (50,22,12) logged. Computation stopped (but dothejob(50,22,11) hadn't finished!). When we'll restart the last log will be (50,22,12), making us believe we should begin at (50,22,13). Wrong: we never finished (50,22,11). Well anyway that's not a big issue, all I need to is to log all items done, and then when doing your `while True: if next(the_args) == (i, j, k): break` I'll modify to break at the first item not already present in the log. – Basj Feb 02 '18 at 11:28
Last thing (thank you very already for everything @Netwave!): about the pickable thing, won't it work with just `def dothejob(i, j, k): print i, j, k`? Then it would be barely usable if we can't print things or I/O. [Here is the related question](https://stackoverflow.com/q/48581455/1422096). – Basj Feb 02 '18 at 11:33
@Basj, answered in the related question: https://stackoverflow.com/questions/48581455/run-a-multiprocessing-job-with-python-2-7-windows – Netwave Feb 02 '18 at 12:14

How to make a computation loop easily splittable and resumable?

1 Answers1

Linked