3

I run this code multiple time with different SLEEP_TIME, for example SLEEP_TIME=0, SLEEP_TIME=1e-3, SLEEP_TIME=10e-3 and also omitted the time.sleep line altogether from the code. For every value of SLEEP_TIME the measured average work time changes, even though the sleep is outside the measured code. This makes zero sense to me - why would calling time.sleep change the way the process behaves even though the code absolutely does not depend on the sleep?

I tested the following code with both linux and windows and the behavior is similar (though in windows omitting the sleep altogether causes the performance to degrade significantly).

import numpy as np
import multiprocessing 
import time

SLEEP_TIME = 1e-3

def do_work():
    total_time = 0
    time_to_run = 500
    for i in range(time_to_run):
        t0 = time.time()
        
        # start work
        nparr = np.ones((1000,100,30))
        nparr[nparr == 0] = 1
        sp = nparr.shape # to synchronize previous call
        # end work
        
        t1 = time.time()                
        total_time += t1 - t0        
        time.sleep(SLEEP_TIME) # WHY DOES THIS MATTER???? THIS IS OUTSIDE THE WORK AND OUTSIDE MEASUREMENT
    
    print(f"avg work time: {1000 * total_time / time_to_run:.2f}ms")        

if __name__ == '__main__':
    
    p1 = multiprocessing.Process(target=do_work)        
    p1.start()
    p2 = multiprocessing.Process(target=do_work)        
    p2.start()

    p1.join()
    p2.join()
    

Example results (on linux):

No sleep (commenting out time.sleep)

Output:

avg work time: 4.50ms

avg work time: 4.56ms

SLEEP_TIME = 0

Output:

avg work time: 4.46ms

avg work time: 4.52ms

SLEEP_TIME = 1e-3

Output:

avg work time: 4.76ms

avg work time: 4.82ms

SLEEP_TIME = 10e-3

Output:

avg work time: 7.05ms

avg work time: 7.07ms

What is happening here? Is the OS trying (and failing) to optimize my process? And how can I execute the work part as fast as possible regardless of the amount of previous sleep time?

ChatGPT suggested I should add to the top of the file:

import os
os.environ["OMP_NUM_THREADS"] = "1"  # or whatever number you choose

While it improves the time of execution with large sleeps, the execution time still defers.

EDIT: I fixed the join strategy like some have rightly suggested. Though it's doesn't affect the problem in question it is better to write the code correctly to avoid confusion.

Ofer Barasofsky
  • 251
  • 2
  • 8
  • I haven't checked, but I'd be surprised if the OS doesn't do a context switch while you're sleeping... you could try setting the sleep time to 0 and you will perhaps see the same effect. – thebjorn Jun 15 '23 at 13:46
  • No, with SLEEP_TIME=0 the behavior is exactly the same as with omitting the time.sleep line altogether (in linux) – Ofer Barasofsky Jun 15 '23 at 13:49
  • More info on sleep(0) https://stackoverflow.com/questions/7273474/behavior-of-pythons-time-sleep0-under-linux-does-it-cause-a-context-switch – thebjorn Jun 15 '23 at 13:55
  • I've seen this thread but it doesn't seem to be related to my problem. With time.sleep(0) the measured time is identical to removing time.sleep from the code – Ofer Barasofsky Jun 15 '23 at 13:57
  • Just a heads up that you never join "p-0". – JonSG Jun 15 '23 at 14:15
  • You are right. However, it doesn’t matter for the sake of this example. In fact, running it with just a single process yields the same phenomenon with regards to the sleep – Ofer Barasofsky Jun 15 '23 at 15:34
  • I don't think it's relevant to your question but your join() strategy is flawed as you will only ever join() the second process that you started – DarkKnight Jun 15 '23 at 15:36
  • Thanks JonSG and DarkKnight. I've edited my question. Though it's doesn't affect the problem in question it is better to write the code correctly to avoid confusion – Ofer Barasofsky Jun 15 '23 at 16:23

3 Answers3

2

I reproduced the behavior of your python script on my Ubuntu machine. In my case it was not specific to python, and I found similar performance degradation in a c++ program that sleeps between each computation.

There are various mechanisms in Linux that reduce the frequency of the CPU(s) in order to save power when the system load is low. In my case, the "CPU frequency scaling governor" was set to "powersave" on all CPUs. You can check this by running:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

In my case, changing to "performance" yields similar time measurements with and without sleep, and the measured time is now even lower than the measured time without sleeping before changing from "powersave". To change these settings, run:

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Note that this will use more power and could lead to more heat generation, so you might want to monitor the temperature of your CPUs to make sure they don't overheat.

Ariel B
  • 36
  • 2
0

Try running measuring multiple runs, for example by changing do_work to:

RUNS = 10

def do_work():
    time_to_run = 50
    min_total_time = None
    for j in range(RUNS):
        total_time = 0
        for i in range(time_to_run):
            t0 = time.time()

            # start work
            nparr = np.ones((1000,100,30))
            nparr[nparr == 0] = 1
            sp = nparr.shape # to synchronize previous call
            # end work

            t1 = time.time()
            total_time += t1 - t0
            time.sleep(SLEEP_TIME) # WHY DOES THIS MATTER???? THIS IS OUTSIDE THE WORK AND OUTSIDE MEASUREMENT
        if min_total_time is None:
            min_total_time = total_time
        else:
            min_total_time = min(min_total_time, total_time)
    
    print(f"min work time: {1000 * min_total_time / time_to_run:.2f}ms")

I think you'll find the differences disappear.

The thing is, the time taken by your computer is very inconsistent because your OS and other programs running in the background are constantly taking up bits of CPU resources. Sometimes your OS just needs to do something -- it may need to check for updates or some scheduled activity will kick in.

Testing the average isn't really useful because one single outlier can have an outsized effect.

The more runs you try, the better you'll be able to compare differences.

Jasmijn
  • 9,370
  • 2
  • 29
  • 43
  • This does not work. Prior to posting I've tested the code with much larger loops. The problem is not "inconsistent measuring". In fact, the results are very consistent with regards to a given sleep time and are independent of the times the loop run – Ofer Barasofsky Jun 15 '23 at 14:11
  • Weird. I cannot reproduce your results if I take the minimum. – Jasmijn Jun 15 '23 at 17:32
  • You can take the median to see that it doesn't matter. You can also record or print all measurements to see that they are all close to the average. It's not a case of a single outlier messing with the average. Make sure the work done in your machine is close to 4-5ms in the best case so that the effect is clear – Ofer Barasofsky Jun 15 '23 at 18:03
  • I don't know what to tell you, no matter what I try I cannot reproduce the effect you describe on my computer. – Jasmijn Jun 15 '23 at 20:12
  • Interesting. What processor and OS are you using? – Ofer Barasofsky Jun 15 '23 at 20:28
  • 1
    I got a 2.4GHz Intel i7, Windows 10 (22H2) – Jasmijn Jun 15 '23 at 20:31
0

You have a rather fast machine, and this is a rather short computation. So it exagerates the effect.

On my machine, for example, the same computation takes 14.2 ms. And I am indeed able to reproduce the effect, but it is more subtle, and I had to run it several time to compute avg/mean, and then p-value to be sure that it wasn't just luck (with single runs, sometimes I get the inverse tendency because of chance).

So, it is around 14 ms for 0 sleep, 14.2 for 1ms, 14.5 for 10 ms, and 14.5 also for 100 ms. So, it does exist, but is not at all like the sleep time was counted into the work time (for 100 ms I had to wait 1 minute, while my machine load was practically 0, so no cpu used, before getting this: most of the time is spent sleeping, and, nevertheless, it doesn't change that much the work time).

I can't be sure of why. But I would bet for @thebjorn hypothesis: because your cpu is bored with your process that does only sleeping, it switches to other processes... or something like that.

Note that if your do use your cpu, so it has really something to do, like using 8 processes instead of 2 (on my machine with 4 cores. If you have more cores, which is likely, you should use even more processes), then the effect is inverted: With 8 concurrent processes, work time is 52, 48, 42 ms respectively for sleep time of 0, 1, and 10 ms.

Which makes sense: the more you sleep, the less your processes are competing which each other. If I use a huge sleep time, then, back to around 15 ms, because process being most of the time sleeping, the probability that the end up being in competition between each other for cores is negligible.

So, what you are measuring here is how your process are in competition with each other, and with others. If the work of your processes is too small to bother each other (2 processes on a 4 cores machines, even if they are never sleeping, are not in competition with each other), then, what matters is the competition with other processes. And if you sleep a lot, maybe your processes are more often interrupted to let the small competition from outside bother them. I don't know. Scheduler works in mysterious ways. If your process have more than enough work for your cpu (which, I guess, should be the case otherwise you wouldn't be bothering measuring performance and using multiple processes), then, what matters is competition with each other, and on the contrary, the more you sleep, the less you compete with others.

chrslg
  • 9,023
  • 5
  • 17
  • 31
  • You can lower the amount of work done by the process to match my timings (~4-5ms without sleeps) and then you will see a much larger effect, similar to what I'm seeing. Something like nparr = np.ones((1000,100,10)). The longer the work done by the process, the smaller the effect becomes in comparison. The code I wrote here is a minimal demonstration of the problem, my real code must operate within 4-6ms and this problem really hurts my real-time performance. – Ofer Barasofsky Jun 15 '23 at 16:25