While working on benchmarking some code, I found that its execution time would vary with even the most innocuous code changes.
I have attempted to boil down the code below to the most minimal test case, but it is still rather lengthy (for which I apologize). Changing virtually anything largely affects the benchmark results.
#include <string>
#include <vector>
#include <iostream>
#include <random>
#include <chrono>
#include <functional>
constexpr double usec_to_sec = 1000000.0;
// Simple convenience timer
class Timer
{
std::chrono::high_resolution_clock::time_point start_time;
public:
Timer() : start_time(std::chrono::high_resolution_clock::now()) { }
int64_t operator()() const {
return static_cast<int64_t>(
std::chrono::duration_cast<std::chrono::microseconds>(
std::chrono::high_resolution_clock::now()-start_time).count()
);
}
};
// Convenience random number generator
template <typename T>
class RandGen
{
mutable std::default_random_engine generator;
std::uniform_int_distribution<T> distribution;
constexpr unsigned make_seed() const {
return static_cast<unsigned>(std::chrono::system_clock::now().time_since_epoch().count());
}
public:
RandGen(T min, T max) : generator(make_seed()), distribution(min, max) { }
T operator ()() { return distribution(generator); }
};
// Printer class
class Printer
{
std::string filename;
template <class S>
friend Printer &operator<<(Printer &, S &&s);
public:
Printer(const char *filename) : filename(filename) {}
};
template <class S>
Printer &operator<<(Printer &pm, S &&s) {
std::cout << s;
return pm;
}
// +------------+
// | Main Stuff |
// +------------+
void runtest(size_t run_length)
{
static RandGen<size_t> word_sz_generator(10, 20);
static RandGen<int> rand_char_generator(0, 25);
size_t total_char_count = 0;
std::vector<std::string> word_list;
word_list.reserve(run_length);
Printer printer("benchmark.dat");
printer << "Running test... ";
Timer timer; // start timer
for (auto i = 0; i < run_length; i++) {
size_t word_sz = word_sz_generator();
std::string word;
for (auto sz = 0; sz < word_sz; sz++) {
word.push_back(static_cast<char>(rand_char_generator())+'a');
}
word_list.emplace_back(std::move(word));
total_char_count += word_sz;
}
int64_t execution_time_usec = timer(); // stop timer
printer << /*run_length*/ word_list.size() << " words, and "
<< total_char_count << " total characters, were built in "
<< execution_time_usec/usec_to_sec << " seconds.\n";
}
int main(int argc, char **argv)
{
constexpr size_t iterations = 30;
constexpr size_t run_length = 50000000;
for (auto i = 0; i < iterations; i++)
runtest(run_length);
return EXIT_SUCCESS;
}
The 1st class, Timer
, is just a small convenience class (intentionally not well-featured, for brevity) for timing the code.
I tried to do without the 2nd class RandGen
(which just generates random values), but any attempt to exclude this from the test code made the problem auto-magically disappear. So, I suspect the issue has something to do with it. But I can't figure out how.
The 3rd class Printer
seems entirely unnecessary for this question, but again, including it seems to exacerbate the issue.
So, now we're down to main()
(which just runs the test) and runtest()
.
runtest()
is hideous, so please don't look at it from a "clean code" point-of-view. Changing it in any way (ex. moving the inner for loop
to its own function) results in a change in benchmark results. The simplest, and most perplexing example is the last line:
printer << /*run_length*/ word_list.size() << " words, and "
<< total_char_count << " total characters, were built in "
<< execution_time_usec/usec_to_sec << " seconds.\n";
In the line above, run_length
and word_list.size()
are the same. The size of vector word_list
is defined by run_length
. But, if I run the code as-is, I get an average execution time of 9.8 seconds, whereas if I uncomment run_length
and comment-out word_list.size()
, the execution time actually increases to an average of 10.6 seconds. I can't fathom how such an insignificant code change could affect the timing of the whole program to such an extent.
In other words...
9.8 seconds:
printer << /*run_length*/ word_list.size() << " words, and "
<< total_char_count << " total characters, were built in "
<< execution_time_usec/usec_to_sec << " seconds.\n";
10.6 seconds:
printer << run_length /*word_list.size()*/ << " words, and "
<< total_char_count << " total characters, were built in "
<< execution_time_usec/usec_to_sec << " seconds.\n";
I have repeated the exercise of commenting and uncommenting the variables noted above, and re-running the benchmarks, many times. The benchmarks are repeatable and consistent - i.e. they are consistently 9.8 seconds and 10.6 seconds, respectively.
The code output looks like this, for the two cases:
Running test... 50000000 words, and 750000798 total characters, were built in 9.83379 seconds. Running test... 50000000 words, and 749978210 total characters, were built in 9.84541 seconds. Running test... 50000000 words, and 749996688 total characters, were built in 9.87418 seconds. Running test... 50000000 words, and 749995415 total characters, were built in 9.85704 seconds. Running test... 50000000 words, and 750017699 total characters, were built in 9.86186 seconds. Running test... 50000000 words, and 749998680 total characters, were built in 9.83395 seconds. ... Running test... 50000000 words, and 749988517 total characters, were built in 10.604 seconds. Running test... 50000000 words, and 749958011 total characters, were built in 10.6283 seconds. Running test... 50000000 words, and 749994387 total characters, were built in 10.6374 seconds. Running test... 50000000 words, and 749995242 total characters, were built in 10.6445 seconds. Running test... 50000000 words, and 749988379 total characters, were built in 10.6543 seconds. Running test... 50000000 words, and 749969532 total characters, were built in 10.6722 seconds. ...
Any information on what would cause this discrepancy would be greatly appreciated.
Notes:
- Even removing the unused
std::string filename
member object from thePrinter
class yields different benchmark results - where doing so, eliminates (or reduces to insignificant proportions) the difference between the two benchmarks provided above. - This does not seem to be an issue when compiling with g++ (on Ubuntu). Although, I can’t say this definitively; my tests with Ubuntu were in a VM on the same Windows machine, where the VM perhaps did not have access to all of the resources and processor enhancements.
- I am using Visual Studio Community 2017 (version 15.7.4)
- Compiler version: 19.14.26431
- All tests and reported results are Release Build, 64-bit
- System: Win10, i7-6700K @ 4.00 GHz, 32 GB RAM