I have a longer code that was cut down as much as possible while keeping the issue alive. My code runs an MCMC computation for different parameter values. For some combinations of values, the code runs much longer, about 100x slower than on typical cases. However, it shouldn't, because the number of operations does not depend on parameter values.
I am running this on an AMD64 Linux box with glibc-2.17 compiled with GCC 4.8.1 on Gentoo. Compile flags do not matter as it appears. I also tested it on a different Gentoo box with an older AMD64 processor and the results were the same.
There were a bunch of tests I did:
- I have tried debugging with Valgrind and it found no memory issues or other nasty things.
- Secondly, I have tried running the code with the problematic parameter values fixed, but the slowness was not encountered.
- I have tried putting
sleep(4)
between iterations, but nothing changed.
The problem shows when the iteration hits k = 1
, i = j = 0
, which translates into mu[0] = -0.05
, mu[1] = -0.05
and mu[2] = 0.05
. As I said, using this fixed value for all iterations eliminates the issue I am seeing.
Here are things that eliminate the problem:
- Changing the limits.
- Fixing the
mu[]
coefficients. - Removing
dW3
from the computation. - Removing
rand()
. - Removing computation of
q
. - Removing the update of
s[j]
.
I have read quit a bit about slowpow
, thus tried eliminating exp
by writing my own version of it. This solves the problems I am having with this MWE, but not when the reimplemented exp
is placed in the production code.
Question: what is causing the semi-random slowness?
Code of the MWE follows. All help and suggestions as how to proceed would be greatly appreciated.
Note: This code is compiled with g++
although it is essentially C
. Changing compiler doesn't change anything.
Regarding branch prediction: Removing one of the if
statements by using
q = exp(dW);
q = q / (1.0 + q);
no matter what is the value of dW
doesn't change code's behavior; if this indeed is due to branch prediction, it would have to be due to the second if
.
#include <cstdio>
#include <cstdlib>
#include <cmath>
inline int index(int const i, int const j, int const n)
{
return (i + n) % n + ((j + n) % n) * n;
}
void get_sample(int* s, int n, double* mu)
{
for (int i = 0; i < 10 * n * n; i++)
{
int j = i % (n * n);
int x = j % n;
int y = (j - x) / n;
double dW1 = mu[0] * (s[index(x - 1, y, n)] + s[index(x + 1, y, n)] + s[index(x, y - 1, n)] + s[index(x, y + 1, n)]);
double dW2 = mu[1] * (s[index(x - 1, y - 1, n)] + s[index(x + 1, y - 1, n)] + s[index(x + 1, y + 1, n)] + s[index(x - 1, y + 1, n)]);
double dW3 = mu[2] * (s[index(x - 1, y, n)] * s[index(x - 1, y - 1, n)] * s[index(x, y - 1, n)] + s[index(x - 1, y, n)] * s[index(x - 1, y + 1, n)] * s[index(x, y + 1, n)]
+ s[index(x, y + 1, n)] * s[index(x + 1, y + 1, n)] * s[index(x + 1, y, n)] + s[index(x + 1, y, n)] * s[index(x + 1, y - 1, n)] * s[index(x, y - 1, n)]);
double dW = 2.0 * (dW1 + dW2 + dW3);
double q;
if (dW < 0.0)
{
q = exp(dW);
q = q / (1.0 + q);
}
else
{
q = exp(-dW);
q = 1.0 / (1.0 + q);
}
double p = ((double) rand()) / ((double) RAND_MAX);
if (p < q)
{
s[j] = 1;
}
else
{
s[j] = -1;
}
}
}
int main(int argc, char** argv)
{
double mu[3];
double limits[6] = {-0.05, 0.8, -0.05, 0.45, -0.45, 0.05};
int s[16];
for (int i = 0; i < 16; i++)
{
s[i] = -1;
}
for (int k = 0; k < 2; k++)
{
for (int j = 0; j < 2; j++)
{
for (int i = 0; i < 2; i++)
{
mu[0] = limits[0] + ((limits[1] - limits[0]) * i);
mu[1] = limits[2] + ((limits[3] - limits[2]) * j);
mu[2] = limits[4] + ((limits[5] - limits[4]) * k);
printf(" Computing (% .6lf, % .6lf, % .6lf)...\n", mu[0], mu[1], mu[2]);
for (int sample = 0; sample < 1000; sample++)
{
get_sample(s, 4, mu);
}
}
}
}
return 0;
}