Why the execution time of a million double-int conversion is the same as an empty loop?

Question

I'm writing a high-performance component with a lot of int-double-int conversions, so I need to know the execution time between them.

static double ToDouble(int val) => (double)val;

static int ToInt(double val) => (int)val;

static void Main(string[] args) {
    const int TIMES = 1000_0000;
    Console.ReadLine();

    var t_0 = Stopwatch.GetTimestamp();
    for (int i = 0; i < TIMES; i++) {
        var val = ToInt(ToDouble(i));
    }
    var t_1 = Stopwatch.GetTimestamp();
    Console.WriteLine((t_1 - t_0) * 100_0000L / Stopwatch.Frequency); // 4002 Microseconds


    var t_2 = Stopwatch.GetTimestamp();
    for (int i = 0; i < TIMES; i++) {

    }
    var t_3 = Stopwatch.GetTimestamp();
    Console.WriteLine((t_3 - t_2) * 100_0000L / Stopwatch.Frequency); // 3997 Microseconds


    Console.ReadLine();
}

I found that int-double-int conversion is so fast that execution time is comparable to empty loops.

I think the code in the first loop is not executed at all, it is optimized by the compiler as an empty loop, is that true?

You can't run benchmarks with such code. In fact, I'd bet the compiler realized the code does nothing, calculated the values *in advance* and simply skipped the loops. — Panagiotis Kanavos, Oct 30 '19 at 14:36
The way you use Stopwatch is wrong too. Stopwatch isn't a timer. It stores the value of a high frequency timer when you call `Start` and calculates the difference from the *current* HF timer value when you stop it or call `ElapsedTicks` or `Elapsed`. `Elapsed` is a TiimeSpan value calculated from ElapsedTicks, so there's no need to divide by the frequency. Use `StartNew` before each benchmark and `Elapsed` or ElapsedTicks when you want to check the elapsed time — Panagiotis Kanavos, Oct 30 '19 at 14:38
*Even that* can return 0 ticks if the loop is so small it completes in nanoseconds. A far more valuable and useful way of benchmarking is to use [BenchmarkDotNet](https://github.com/dotnet/BenchmarkDotNet). It takes care to collect enough samples to overcome noise and startup variations, automatically collects *allocation and garbage collection* metrics and shows the real performance of a method. — Panagiotis Kanavos, Oct 30 '19 at 14:40
@PanagiotisKanavos - Actually, the `GetTimestamp` approach is fine. It is even prefered in some cases, because it avoids allocating a `Stopwatch` class instance. It also allows for timing across threads or processes, as long as the machine doesn't restart. (Basically, it exposes `QueryPerformanceCounter`.) Also, you are correct that `Elapsed` doesn't require dividing by the frequency, but both `GetTimestamp` and `ElapsedTicks` *do*, because the ticks of a stopwatch are not necessarily 100ns ticks like those used in `TimeSpan` or `DateTime`. — Matt Johnson-Pint, Oct 30 '19 at 18:28
(I do agree about using BenchmarkDotNet for the reasons you stated though.) — Matt Johnson-Pint, Oct 30 '19 at 18:34
@MattJohnson-Pint use Restart() then - that [simply resets the internal state](https://referencesource.microsoft.com/#system/services/monitoring/system/diagnosticts/Stopwatch.cs,110) and retrieves the current tick count. There's no reason to do manually what the instance methods already do - take the tick counts and calculate the difference — Panagiotis Kanavos, Oct 31 '19 at 07:36
Sure, if you're in a single thread and can re-use the stopwatch instance. But there are many scenarios where that isn't the case. — Matt Johnson-Pint, Oct 31 '19 at 19:06

score 3 · Answer 1 · 2019-10-30T19:02:47.547

You should use StopWatch.Start and Stop then Elapsed!

const int TIMES = 100_000_000;

var chrono = new Stopwatch();
int val = 0;

chrono.Start();
for ( int i = 1; i <= TIMES; i++ )
  val = ToInt(ToDouble(i));
chrono.Stop();
Console.WriteLine(val);
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());

chrono.Restart();
for ( int i = 1; i <= TIMES; i++ )
{
  var v1 = (double)i;
  val = (int)v1;
}
chrono.Stop();
Console.WriteLine(val);
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());

chrono.Restart();
for ( int i = 1; i <= TIMES; i++ )
  val = i;
chrono.Stop();
Console.WriteLine(val);
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());

chrono.Restart();
for ( int i = 1; i <= TIMES; i++ )
  ;
chrono.Stop();
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());

Output with Debug mode:

Using Release build optimized:

The first loop in Debug IL:

  // value = ToInt(ToDouble(i));
  IL_0015: ldloc.2
  IL_0016: call float64 ConsoleApp.Program::ToDouble(int32)
  IL_001b: call int32 ConsoleApp.Program::ToInt(float64)
  IL_0020: stloc.1

The first loop in Release IL:

  // value = ToInt(ToDouble(i));
  IL_0012: ldloc.2
  IL_0013: call float64 ConsoleApp.Program::ToDouble(int32)
  IL_0018: call int32 ConsoleApp.Program::ToInt(float64)
  IL_001d: stloc.1

The second loop Debug:

  // double num = j;
  IL_0065: ldloc.s 5
  IL_0067: conv.r8
  IL_0068: stloc.s 6
  // value = (int)num;
  IL_006a: ldloc.s 6
  IL_006c: conv.i4
  IL_006d: stloc.1

The second loop Release:

  // value = (int)(double)j;
  IL_0054: ldloc.s 4
  IL_0056: conv.r8
  IL_0057: conv.i4
  IL_0058: stloc.1

Proc calls eat a lot of CPU ticks and this is the first thing to consider when optimizing, with loops and calculation.

The compiler optimization is mainly with the loop itself:

Debug:

// for (int i = 1; i <= 100000000; i++)
IL_0010: ldc.i4.1
IL_0011: stloc.2
// (no C# code)
IL_0012: br.s IL_0026
// loop start (head: IL_0026)
  //...
  // for (int i = 1; i <= 100000000; i++)
  IL_0022: ldloc.2
  IL_0023: ldc.i4.1
  IL_0024: add
  IL_0025: stloc.2
  // for (int i = 1; i <= 100000000; i++)
  IL_0026: ldloc.2
  IL_0027: ldc.i4 100000000
  IL_002c: cgt
  // (no C# code)
  IL_002e: ldc.i4.0
  IL_002f: ceq
  IL_0031: stloc.3
  IL_0032: ldloc.3
  IL_0033: brtrue.s IL_0014
// end loop

Release:

// for (int i = 1; i <= 100000000; i++)
IL_000c: ldc.i4.1
IL_000d: stloc.1
// (no C# code)
IL_000e: br.s IL_0020
// loop start (head: IL_0020)
  // ...
  // for (int i = 1; i <= 100000000; i++)
  IL_001c: ldloc.1
  IL_001d: ldc.i4.1
  IL_001e: add
  IL_001f: stloc.1
  // for (int i = 1; i <= 100000000; i++)
  IL_0020: ldloc.1
  IL_0021: ldc.i4 100000000
  IL_0026: ble.s IL_0010
// end loop

The loops in debug without the console.writeline(val):

  // value = ToInt(ToDouble(i));
  IL_0015: ldloc.2
  IL_0016: call float64 ConsoleApp.Program::ToDouble(int32)
  IL_001b: call int32 ConsoleApp.Program::ToInt(float64)
  IL_0020: stloc.1

  // double num = j;
  IL_0065: ldloc.s 5
  IL_0067: conv.r8
  IL_0068: stloc.s 6
  // value = (int)num;
  IL_006a: ldloc.s 6
  IL_006c: conv.i4
  IL_006d: stloc.1

  // value = k;
  IL_00b7: ldloc.s 8
  IL_00b9: stloc.1

  // nothing

The loops in release without the console.writeline(val):

  // ToInt(ToDouble(i));
  IL_0010: ldloc.1
  IL_0011: call float64 ConsoleApp.Program::ToDouble(int32)
  IL_0016: call int32 ConsoleApp.Program::ToInt(float64)
  IL_001b: pop

  // _ = (double)j;
  IL_004b: ldloc.3
  IL_004c: conv.r8
  IL_004d: pop

  // nothing

  // nothing

So your answer is that indeed an empty loop should take more time than the one with code in it? — Rotem, Oct 30 '19 at 14:48
You can replace `chrono.Reset(); chrono.Start();` with a single `chrono.Restart();`. — itsme86, Oct 30 '19 at 14:52
This still doesn't address the elephant in the room - any meaningful profiling will want optimizations enabled, and because `val` is not used at all, the compiler will just delete that code and the two loops will be equivalent. — Rotem, Oct 30 '19 at 14:54
@Rotem Answer updated. You will be suprised... so you may read the explaination at the end. — , Oct 30 '19 at 17:54

Why the execution time of a million double-int conversion is the same as an empty loop?

1 Answers1

Linked

Related