2

This post says

if the body of your loop is simple, the interpreter overhead of the loop itself can be a substantial amount of the overhead

and gives this example to illustrate Parallel.

def convolve_random(size):
    ''' Convolve two random arrays of length "size" '''
    return np.convolve(np.random.random_sample(size), np.random.random_sample(size))
%timeit convolve_random(40000)
1 loops, best of 3: 904 ms per loop

%timeit [convolve_random(40000 + i*1000) for i in xrange(8)]
# In parallel, with 8 jobs
%timeit Parallel(n_jobs=8)(delayed(convolve_random)(40000 + i*1000) for i in xrange(8))
1 loops, best of 3: 8.69 s per loop
1 loops, best of 3: 2.88 s per loop

in this case, is there a way to estimate the Python interpreter overhead of the loop itself?

  • You could try timing a loop with an empty body. – Haroldo_OK Aug 29 '19 at 12:15
  • those links both go to the same place and it doesn't contain that quote. that said, python should take a few nanoseconds per loop iteration. this is many orders of magnitude smaller than your function – Sam Mason Aug 29 '19 at 12:49
  • @SamMason Thanks for reminder. I've updated the OP. Does your "orders of magnitude" mean `An order of magnitude is an approximate measure of the number of digits that a number has in the commonly-used base-ten number system. It is equal to the whole number floor of logarithm (base 10). For example, the order of magnitude of 1500 is 3, because 1500 = 1.5 × 103.`? –  Aug 29 '19 at 13:05
  • @whnlp just consider that the difference is such that it makes the overhead negligeable for all practical considerations - if you need to worry about this overhead then the solution is most probably to switch to plain C - which is what libraries like numpy are doing FWIW, and which is why they are much faster than plain python loops. – bruno desthuilliers Aug 29 '19 at 13:13
  • yes., in this case the overhead you're worrying about (interpreter overhead) is approx. 6 orders of magnitude (i.e. a million times) smaller than the thing you care about (runtime of your function). OP conventionally means "original poster", i.e. person who originally posted the question. – Sam Mason Aug 29 '19 at 13:14

1 Answers1

1

Q : is there a way to estimate the Python interpreter overhead of the loop itself?

You have already received the answer 3 hours ago for this, with directions for benchmarking templates, using [us]-resolution timings.

If you did not try the recommended test templates to measure it, go for it and you will receive hard data what are the looping costs ( best with beyond cache-size data-samples + avoiding the costs of np.random.random() generation ).

user3666197
  • 1
  • 6
  • 50
  • 92
  • wow, your last answer is comprehensive... @whnlp the *30.9 second were "wasted"* comment is basically the carry over for this question. as a rough guide, I'd suggest aiming to do >10ms of work in each parallel job, and make sure you don't move too much data around as that can be expensive – Sam Mason Aug 29 '19 at 21:44
  • What do [T0], [T0+tsB] and [T0+tsB+tpB] in [Revisit the Amdahl's law](https://stackoverflow.com/revisions/18374629/3) mean? What do "tsB", "tpB" stand for? –  Aug 30 '19 at 02:23
  • @whnlp In Fig.1, [T0] is a time,when either SEQ.A or SEQ.B pure-[SERIAL] processing start.The process-flow (A),labeled as SEQ.A,finishes the whole job at [T0+tsA] (after duration of tsA = **t**ime-of-**s**erial-processing-**A** [s]).The process-flow (B),consisting of both serial-part and parallel-part,labeled as SEQ.B, resp. PAR.B:,finishes the whole job at [T0+tsB+tpB],after duration of tsB = **t**ime-of-**s**erial-processing-**B** [s] + duration of tpB = **t**ime-of-**p**aralel-processing-**B** [s],based on n~{1|2|4|8} resources R.i? being free & indivisible atomic-job duration "____." – user3666197 Aug 30 '19 at 05:32
  • Thanks for your answer! Does "PAR-RESOURCE" mean the resources that could be used for parallel computing, such as processor core, multithread or a cluster of computers? Which one? Or could any one of them? –  Aug 30 '19 at 07:27