1

I noticed that code on my computer runs significantly slower if I use std::thread_sleep_for before execution. I wrote a small google benchmark for demonstration, just some simple for loop that is execution a 100 million times.

BENCHMARK CODE WITHOUT SLEEP_FOR:

int test(int iterations) {
  int a;
  for (int i = 0; i < iterations; i++) {
    a = i * 200 + 100;
  }
  return a;
}

void runBenchmark(benchmark::State &state) {
  for (auto _ : state) {
    test(100000000);
  }
}

BENCHMARK RESULTS WITHOUT SLEEP_FOR:
                                Time            CPU               Iterations
../1/process_time               145 ms          145 ms            1
../1/process_time               144 ms          144 ms            1
../1/process_time               144 ms          144 ms            1
../1/process_time               144 ms          144 ms            1
../1/process_time               144 ms          144 ms            1
../1/process_time               144 ms          144 ms            1
../1/process_time               144 ms          144 ms            1
../1/process_time               145 ms          145 ms            1
../1/process_time               144 ms          144 ms            1
../1/process_time               144 ms          144 ms            1
../1/process_time_mean          144 ms          144 ms           10
../1/process_time_median        144 ms          144 ms           10
../1/process_time_stddev      0.419 ms        0.417 ms           10
../1/process_time_min           144 ms          144 ms           10

The second time i let the thread sleep 100ms before executing the loop. The result is of course that the wall time is a 100ms higher. But in addition the CPU time is also increased by about 30ms:

CODE WITH SLEEP_FOR:

int test(int iterations) {
  int a;
  for (int i = 0; i < iterations; i++) {
    a = i * 200 + 100;
  }
  return a;
}

void runBenchmark(benchmark::State &state) {
  for (auto _ : state) {
    std::this_thread::sleep_for(100ms);
    test(100000000);
  }
}

BENCHMARK RESULTS WITH SLEEP_FOR:
                                Time            CPU               Iterations
../1/process_time               275 ms          175 ms            1
../1/process_time               273 ms          173 ms            1
../1/process_time               272 ms          172 ms            1
../1/process_time               276 ms          176 ms            1
../1/process_time               274 ms          173 ms            1
../1/process_time               271 ms          171 ms            1
../1/process_time               271 ms          171 ms            1
../1/process_time               273 ms          173 ms            1
../1/process_time               276 ms          176 ms            1
../1/process_time               274 ms          173 ms            1
../1/process_time_mean          274 ms          173 ms           10
../1/process_time_median        274 ms          173 ms           10
../1/process_time_stddev       1.75 ms         1.75 ms           10
../1/process_time_min           275 ms          175 ms           10

What might be the explanation for that? When I first encountered that behaviour in another program I am just working on I was expecting some kind of scheduling issue. But since the benchmark clearly states, that CPU time itself is increased, I really have no idea what is going on.

The CPU Model that the code is running on is an Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz.

EDIT: Here is an additional Image of my original program. It shows the execution of 3 neural networks. Sleep_for is only executed before running the first network. It shows that especially the first layers of the first network run extremely slow in comparison.

execution-trace

EDIT: Code has been compiled using llvm-8 and -O2 Optimization flag on an ubuntu-18.04 machine

kind regards lpolari

L Polari
  • 33
  • 5
  • 1
    Putting a thread to sleep involves more work inside the kernel and its scheduler than you might think. Not to mention the user-space time needed for the call to `sleep_for` itself. This extra work isn't free. – Some programmer dude Jul 27 '20 at 11:47
  • 1
    `sleep_for` sleeps for **at least** the requested time, not exactly the time you ask for – 463035818_is_not_an_ai Jul 27 '20 at 11:52
  • What is your CPU performance governor? – n. m. could be an AI Jul 27 '20 at 11:54
  • 1
    @Someprogrammerdude 30ms is way way way too much for this overhead. – n. m. could be an AI Jul 27 '20 at 11:54
  • I found this question interesting for you https://stackoverflow.com/questions/17325888/c11-thread-waiting-behaviour-stdthis-threadyield-vs-stdthis-thread – Steyrix Jul 27 '20 at 11:55
  • 1
    By the way, you have [*undefined behavior*](https://en.wikipedia.org/wiki/Undefined_behavior) in your code, as you will have arithmetic signed integer overflow with `i * 200`. – Some programmer dude Jul 27 '20 at 11:55
  • Oh and please clarify: How did you build your code? With or without optimizations enabled? Never to benchmarking on unoptimized code. And how do you do the measuring? Without a [mcve] it's going to be really hard to say anything definitive. Also please take some time to read [the help pages](http://stackoverflow.com/help), take the SO [tour], read [ask], as well as [this question checklist](https://codeblog.jonskeet.uk/2012/11/24/stack-overflow-question-checklist/). – Some programmer dude Jul 27 '20 at 11:56
  • @n.'pronouns'm. Yeah it seems overly large. Unfortunately we don't have enough information to do more than guess. – Some programmer dude Jul 27 '20 at 12:00
  • Actually the increase in CPU time is really due to the slower execution time after sleep_for already returned. I just edited my question and added a screenshot from the original program where I encountered that problem first hand. It is a neural network executed on the CPU three times. The sleep only takes place before the first execution. The execution of the first network starts just in time after that 100ms, But the first layers of that network run much slower that the ones of the following two. – L Polari Jul 27 '20 at 12:08
  • Cache invalidation due to context switch? – freakish Jul 27 '20 at 12:16
  • I thought about cache invalidation too. But if the code is just executed for the first time, we can still see those significant differences, and I think cache should not matter in that case. – L Polari Jul 27 '20 at 12:49
  • 1
    Sleeping lets the CPU's clock speed drop down to idle, from where it takes some time to jump back up. This hurts the early part of the first timed region after a sleep. The effect might be smaller with `energy_performance_preference=performance`, resulting in a jump back up to max turbo nearly instantly after your process re-awakens. With a trivial loop like this (small code footprint, all data access optimized away), even a deep sleep that emptied caches should be fine. – Peter Cordes Jul 27 '20 at 13:03
  • I should maybe have posted that as an answer + a link, instead of duplicate + comment. Your test does run for long enough to ramp up to max speed before the next sleep so it's qualitatively a little different. Let me know if that would be better. – Peter Cordes Jul 27 '20 at 13:36

0 Answers0