I had a small benchmark to check how much faster/slower adding int
s is in comparison to long
s. My assumption was that int
should be faster as on x64 two of them fit in one CPU register (in contrast to a 64-bit wide long
). To my surprise, they behave more or less the same.
But most surprising is the fact, that adding integers and returning a long was the fastest on my machine (MacBook M1 Pro - so an ARM chip).
private const int Iterations = 1_000_000;
[Benchmark(Baseline = true)]
[Arguments(10, 20, 30)]
public int AddIntReturnInt(int a, int b, int c)
{
int result = 0;
for (var i = 0; i < Iterations; i++)
result += a + b + c;
return result;
}
[Benchmark]
[Arguments(10, 20, 30)]
public long AddIntReturnLong(int a, int b, int c)
{
long result = 0;
for (var i = 0; i < Iterations; i++)
result += a + b + c;
return result;
}
[Benchmark]
[Arguments(10L, 20L, 30L)]
public long AddLongReturnLong(long a, long b, long c)
{
long result = 0;
for (var i = 0; i < Iterations; i++)
result += a + b + c;
return result;
}
Results:
BenchmarkDotNet=v0.13.2, OS=macOS Monterey 12.6.1 (21G217) [Darwin 21.6.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK=7.0.100
[Host] : .NET 7.0.0 (7.0.22.51805), Arm64 RyuJIT AdvSIMD
DefaultJob : .NET 7.0.0 (7.0.22.51805), Arm64 RyuJIT AdvSIMD
| Method | a | b | c | Mean | Error | StdDev | Ratio |
|------------------ |--- |--- |--- |---------:|--------:|--------:|------:|
| AddIntReturnInt | 10 | 20 | 30 | 935.5 us | 2.09 us | 1.95 us | 1.00 |
| AddIntReturnLong | 10 | 20 | 30 | 318.1 us | 0.74 us | 0.61 us | 0.34 |
| AddLongReturnLong | 10 | 20 | 30 | 933.6 us | 2.12 us | 1.98 us | 1.00 |
My question is how one can explain this behavior. Even the IL-code isn't smaller when returning a long (like fewer bound-checks and stuff).
EDIT 1: I updated the benchmark to run 1 million times instead of only 1000.