0

I'm playing with the warmup property of Benchee benchmarks increasing the warmup time gradually. I was expecting to get better results with a longer warmup time, but I get the opposite results.

For example, running the following example (extracted from the Benchee docs):

list = Enum.to_list(1..10_000)
map_fun = fn i -> [i, i * i] end

Benchee.run(
  %{"flat_map" => fn -> Enum.flat_map(list, map_fun) end},
  warmup: 0,
)

Benchee.run(
  %{"flat_map" => fn -> Enum.flat_map(list, map_fun) end},
  warmup: 2,
)

Benchee.run(
  %{"flat_map" => fn -> Enum.flat_map(list, map_fun) end},
  warmup: 4,
)

Benchee.run(
  %{"flat_map" => fn -> Enum.flat_map(list, map_fun) end},
  warmup: 8,
)

Benchee.run(
  %{"flat_map" => fn -> Enum.flat_map(list, map_fun) end},
  warmup: 16,
)

I did several invocations of the above script and I got similar results. The first execution with warmup: 0 was the best option.

Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.10.3
Erlang 23.0.2

Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 5 s

Benchmarking flat_map...

Name               ips        average  deviation         median         99th %
flat_map        1.70 K      588.45 μs    ±14.33%         563 μs     1017.17 μs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.10.3
Erlang 23.0.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 7 s

Benchmarking flat_map...

Name               ips        average  deviation         median         99th %
flat_map        1.67 K      600.24 μs    ±18.69%         563 μs     1085.84 μs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.10.3
Erlang 23.0.2

Benchmark suite executing with the following configuration:
warmup: 4 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 9 s

Benchmarking flat_map...

Name               ips        average  deviation         median         99th %
flat_map        1.66 K      602.44 μs    ±18.32%         564 μs     1085.14 μs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.10.3
Erlang 23.0.2

Benchmark suite executing with the following configuration:
warmup: 8 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 13 s

Benchmarking flat_map...

Name               ips        average  deviation         median         99th %
flat_map        1.65 K      606.06 μs    ±17.35%      573.98 μs     1072.98 μs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Number of Available Cores: 16
Available memory: 32 GB
Elixir 1.10.3
Erlang 23.0.2

Benchmark suite executing with the following configuration:
warmup: 16 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s

Benchmarking flat_map...

Name               ips        average  deviation         median         99th %
flat_map        1.66 K      601.32 μs    ±17.79%         573 μs        1081 μs

In other VMs usually, you get better performance after a warmup phase.

How does the warmup work in BEAM? and particularly in Benchee?

Thanks in advance, Humberto

Humberto
  • 328
  • 3
  • 12
  • i9-9980HK has a [max turbo of 5GHz](https://ark.intel.com/content/www/us/en/ark/products/192990/intel-core-i9-9980hk-processor-16m-cache-up-to-5-00-ghz.html) for a TDP of 45W. It probably can't sustain that for long given thermal and power constraints ([Why can't my CPU maintain peak performance in HPC](https://stackoverflow.com/q/36363613) is an extreme case; a 4.5W to 6W chip), so your first 5-second interval might have gone through warm-up and into fall-off, if this particular workload makes a lot of heat. A short warm-up like 1 second for 1 or 2 sec of work might be the sweet spot. – Peter Cordes Jun 26 '20 at 20:19
  • Thanks for the reference @PeterCordes. I'll check it and try your suggestion. I'll let you know the results – Humberto Jun 26 '20 at 20:35

1 Answers1

0

warmup in benchee is doing nothing more then just run your measured function for several amount of time without counting the results.

benchee creators point us out, that warmup concept is taken for JIT languages, where first code run will compile it and cache for future calls.

How the warmup can affect the results of the benchmark in BEAM? It's a question that absolutely impossible to answer, but let's try to find any caching mechanisms in it:

  1. Atom's table can be populated during first function run, if it extensively uses atoms OR generates it on the fly.
  2. May be the code itself (business logic) have some caching mechanism, that can somehow affect a benchmark.
  3. File descriptors can be opened, if the benchmarked code uses IO that doesn't close the descriptors - for ex. files, http requests, etc. Also, Page cache of the underlinge OS (better to say File System) can be affected.

If your code is straightforward, dumb and Thread|CPU bounded, warmup will just add some degrees to your CPU and make your cooling system to work harder, without showing any sanity upon the results.

Virviil
  • 622
  • 5
  • 14
  • Your last paragraph is not accurate for modern CPUs that adjust their clock speed in response to demand. Several milliseconds of warm-up will get the CPU up to max turbo. Also, if your benchmark touches memory at all, the first run will page-fault in any lazily-mapped pages, and prime the TLBs and even data caches, as well as code caches and branch prediction. Most of these effects are a minor part of a long 5-second timed interval, but page faults for large enough buffers can be significant. See [Idiomatic way of performance evaluation?](https://stackoverflow.com/q/60291987) – Peter Cordes Jun 28 '20 at 20:12
  • Thanks for your replies Virvii and Peter Cordes. I have new things from them – Humberto Jun 29 '20 at 06:55