I want to get an idea of how fast the modulus (%
) operator runs. I have setup a simple program to benchmark %
being applied to randomly generated values. The time is measured in nanoseconds with a high resolution clock. Often times it reports 0ns has elapsed. Obviously nothing happens instantaneously, so why would this be? If I increase the number of rounds to about 50,000 it usually takes about 1,000,000ns. But even 5000 rounds is always 0ns . Am I measuring it wrong? What optimization is being done to allow for this?
#include <iostream>
#include <chrono>
#include <random>
void runTest(const int rounds, const int min, const int max);
int main()
{
std::cout << "started" << std::endl;
runTest(5000, 1000000, 2000000);
return 0;
}
/*IN: number of rounds to run on the test, the min and max value to choose between for operands to mod
OUT: time taken (in nanoseconds) to complete each operation on the same randomly generated numbers*/
void runTest(const int rounds, const int min, const int max)
{
std::random_device rd; // only used once to initialise (seed) engine
std::mt19937 rng(rd()); // random-number engine used (Mersenne-Twister in this case)
std::uniform_int_distribution<int> uni(min,max); // guaranteed unbiased
std::chrono::nanoseconds durationNormalMod = std::chrono::nanoseconds::zero();
std::chrono::nanoseconds durationFastMod = std::chrono::nanoseconds::zero();
long long result = 0;
for(auto i = 0; i < rounds; i++)
{
const int leftOperand = uni(rng);
const int rightOperand = uni(rng);
auto t1 = std::chrono::high_resolution_clock::now();
long long x = (leftOperand % rightOperand);
auto t2 = std::chrono::high_resolution_clock::now();
//std::cout << "x: " << x << std::endl;
result += x;
durationNormalMod += std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1);
}
std::cout << "duration of %: " << durationNormalMod.count() << std::endl;
std::cout << "result: " << result << std::endl;//preventing optimization by using result
}
I compile with g++ prog.cpp -o prog.exe -O3
.
I'm interested because I have a specific case where I can implement modulus using a different algorithm and I'm curious if it's faster.