While doing some performance testing, I've run into a situation that I cannot seem to explain.
I have written the following C code:
void multi_arr(int32_t *x, int32_t *y, int32_t *res, int32_t len)
{
for (int32_t i = 0; i < len; ++i)
{
res[i] = x[i] * y[i];
}
}
I use gcc to compile it, along with a test driver, into a single binary. I also use gcc to compile it by itself into a shared object which I call from C# via p/invoke. The intent is to measure the performance overhead of calling native code from C#.
In both C and C#, I create equal length input arrays of random values and then measure how long it takes multi_arr
to run. In both C# and C I use the POSIX clock_gettime() call for timing. I have positioned the timing calls immediately preceding and following the call to multi_arr, so input prep time etc do not impact results. I run 100 iterations and report both the average and the min times.
Even though C and C# are executing the exact same function, C# comes out ahead about 50% of the time, usually by a significant amount. For example, for a len of 1,048,576, C#'s min is 768,400 ns vs C's min of 1,344,105. C#'s avg is 1,018,865 vs C's 1,852,880. I put some different numbers into this graph (mind the log scales):
These results seem extremely wrong to me, but the artifact is consistent across multiple tests. I've checked the asm and IL to verify correctness. Bitness is the same. I have no idea what could be impacting performance to this degree. I've put a minimal reproduction example up here.
These tests were all run on Linux (KDE neon, based off Ubuntu Xenial) with dotnet-core 2.0.0 and gcc 5.0.4.
Has anyone seen this before?