0

I have a pretty specific problem. I want to measure execution time of the generator loop (with the yield keyword). However, I don't know in what intervals next() will be called on this generator. This means I can't just get the timestamp before and after the loop. I thought getting the timestamp at the beginning and end of each iteration will do the trick but I'm getting very inconsistent results.

Here's the test code:

import time


def gen(n):
    total = 0
    for i in range(n):
        t1 = time.process_time_ns()
        # Something that takes time
        x = [i ** i for i in range(i)]
        t2 = time.process_time_ns()
        yield x
        total += t2 - t1
    print(total)


def main():
    for i in gen(100):
        pass

    for i in gen(100):
        time.sleep(0.001)

    for i in gen(100):
        time.sleep(0.01)


if __name__ == '__main__':
    main()

Typical output for me looks something like this:

2151918
9970539
11581393

As you can see it looks like the delay outside of the loop somehow influences execution time of the loop itself.

What is the reason of this behavior? How can I avoid this inconsistency? Maybe there's some entirely different way of doing what I'm trying to achieve?

sajran
  • 317
  • 4
  • 15
  • 1
    what is your problem? these numbers are in `nano` seconds – Mohsen_Fatemi Oct 24 '19 at 13:17
  • @Mohsen_Fatemi The problem is that the second number is greater than the first number, and the third number is greater than the second. Code executed inside of the loop is exactly the same so I don't see why the execution time changes. And no, this is not just coincidence. If I run this script many times it's clear that second and third measurement always returns greater values. – sajran Oct 24 '19 at 13:20
  • use `timeit` to time execution of small code snippets- there is inherent variability if your background processes don't forget – Chris_Rands Oct 24 '19 at 13:21
  • @Chris_Rands I don't think timeit is suitable in my case. In the post the timed code is very short and simple, however in real use this code is not short at all, and timing it is only a side-feature. – sajran Oct 24 '19 at 13:30
  • well if you're real use case the time is not short at all then you don't need to worry about nanoseconds – Chris_Rands Oct 24 '19 at 13:31
  • @Chris_Rands In the post the third result is 5 times greater than the first. In my reals use case I time bigger code but also use bigger delays. Delayed code still can return result as much as 2 times greater than the non-delayed one. This isn't nanoseconds, this is big difference. – sajran Oct 24 '19 at 13:48

1 Answers1

1

You can switch the yield x and total += t2 - t1 lines to only count the time it takes to create x.

For more in dept also see: Behaviour of Python's "yield"

Encrypted
  • 188
  • 7
  • 1
    Describe more. this is not an answer to OP – Mohsen_Fatemi Oct 24 '19 at 13:19
  • I edited my comment to point to a more in dept post about the yield behaviour. I found: > You can think of it as if the function which yields simply "pauses" when it comes across the yield. The next time you call it, it will resume after the yield keeping the state that it was in when it left. – Encrypted Oct 24 '19 at 13:22
  • Thank you for the answer, however I don't understand how switching those lines will change anything in the result of this code. – sajran Oct 24 '19 at 13:25
  • Ah i think i misread and misunderstood your question. Why are you using `t2 = time.process_time_ns()` and not `t2 = time.time()`? I have never worked with `time.process_time_ns()` before but it seems to me that this counts the whole cpu time the thread is active? So if you switch sleeps in the for loops do you get the same output? I have to update from 3.6 in order to test that function sorry. – Encrypted Oct 24 '19 at 13:56
  • @Encrypted I used `time.process_time_ns()` in the post because I thought it makes sense in this usecase (it's supposed to not count time during `time.sleep()`) but I tried with all available functions including `time.time()`, result is always the same. – sajran Oct 25 '19 at 09:52
  • Ahh i see what you mean. I have gotten some results where the `pass` is quicker then the `time.sleep(0.001)` however there seems to be a correlation between the execution time of the loop the yield value is used and the time it takes to calculate the next value to yield. I suspect this is really low level how python uses memory. And maybe how sleep puts the variables like `total` and `n` away so it needs to be loaded again. Do not know any of this for sure. – Encrypted Oct 25 '19 at 13:05