0

I'm writing a high-performance component with a lot of int-double-int conversions, so I need to know the execution time between them.

static double ToDouble(int val) => (double)val;

static int ToInt(double val) => (int)val;

static void Main(string[] args) {
    const int TIMES = 1000_0000;
    Console.ReadLine();

    var t_0 = Stopwatch.GetTimestamp();
    for (int i = 0; i < TIMES; i++) {
        var val = ToInt(ToDouble(i));
    }
    var t_1 = Stopwatch.GetTimestamp();
    Console.WriteLine((t_1 - t_0) * 100_0000L / Stopwatch.Frequency); // 4002 Microseconds


    var t_2 = Stopwatch.GetTimestamp();
    for (int i = 0; i < TIMES; i++) {

    }
    var t_3 = Stopwatch.GetTimestamp();
    Console.WriteLine((t_3 - t_2) * 100_0000L / Stopwatch.Frequency); // 3997 Microseconds


    Console.ReadLine();
}

I found that int-double-int conversion is so fast that execution time is comparable to empty loops.

I think the code in the first loop is not executed at all, it is optimized by the compiler as an empty loop, is that true?

srnldai
  • 259
  • 1
  • 7
  • 9
    You can't run benchmarks with such code. In fact, I'd bet the compiler realized the code does nothing, calculated the values *in advance* and simply skipped the loops. – Panagiotis Kanavos Oct 30 '19 at 14:36
  • 1
    The way you use Stopwatch is wrong too. Stopwatch isn't a timer. It stores the value of a high frequency timer when you call `Start` and calculates the difference from the *current* HF timer value when you stop it or call `ElapsedTicks` or `Elapsed`. `Elapsed` is a TiimeSpan value calculated from ElapsedTicks, so there's no need to divide by the frequency. Use `StartNew` before each benchmark and `Elapsed` or ElapsedTicks when you want to check the elapsed time – Panagiotis Kanavos Oct 30 '19 at 14:38
  • 4
    *Even that* can return 0 ticks if the loop is so small it completes in nanoseconds. A far more valuable and useful way of benchmarking is to use [BenchmarkDotNet](https://github.com/dotnet/BenchmarkDotNet). It takes care to collect enough samples to overcome noise and startup variations, automatically collects *allocation and garbage collection* metrics and shows the real performance of a method. – Panagiotis Kanavos Oct 30 '19 at 14:40
  • 1
    You should use StopWatch.Start and Stop then Elapsed! –  Oct 30 '19 at 14:42
  • @PanagiotisKanavos - Actually, the `GetTimestamp` approach is fine. It is even prefered in some cases, because it avoids allocating a `Stopwatch` class instance. It also allows for timing across threads or processes, as long as the machine doesn't restart. (Basically, it exposes `QueryPerformanceCounter`.) Also, you are correct that `Elapsed` doesn't require dividing by the frequency, but both `GetTimestamp` and `ElapsedTicks` *do*, because the ticks of a stopwatch are not necessarily 100ns ticks like those used in `TimeSpan` or `DateTime`. – Matt Johnson-Pint Oct 30 '19 at 18:28
  • (I do agree about using BenchmarkDotNet for the reasons you stated though.) – Matt Johnson-Pint Oct 30 '19 at 18:34
  • @MattJohnson-Pint use Restart() then - that [simply resets the internal state](https://referencesource.microsoft.com/#system/services/monitoring/system/diagnosticts/Stopwatch.cs,110) and retrieves the current tick count. There's no reason to do manually what the instance methods already do - take the tick counts and calculate the difference – Panagiotis Kanavos Oct 31 '19 at 07:36
  • Sure, if you're in a single thread and can re-use the stopwatch instance. But there are many scenarios where that isn't the case. – Matt Johnson-Pint Oct 31 '19 at 19:06

1 Answers1

3

You should use StopWatch.Start and Stop then Elapsed!

const int TIMES = 100_000_000;

var chrono = new Stopwatch();
int val = 0;

chrono.Start();
for ( int i = 1; i <= TIMES; i++ )
  val = ToInt(ToDouble(i));
chrono.Stop();
Console.WriteLine(val);
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());

chrono.Restart();
for ( int i = 1; i <= TIMES; i++ )
{
  var v1 = (double)i;
  val = (int)v1;
}
chrono.Stop();
Console.WriteLine(val);
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());

chrono.Restart();
for ( int i = 1; i <= TIMES; i++ )
  val = i;
chrono.Stop();
Console.WriteLine(val);
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());

chrono.Restart();
for ( int i = 1; i <= TIMES; i++ )
  ;
chrono.Stop();
Console.WriteLine(chrono.ElapsedMilliseconds.ToString());

Output with Debug mode:

729
270
194
218

Using Release build optimized:

84
61
57
31

The first loop in Debug IL:

  // value = ToInt(ToDouble(i));
  IL_0015: ldloc.2
  IL_0016: call float64 ConsoleApp.Program::ToDouble(int32)
  IL_001b: call int32 ConsoleApp.Program::ToInt(float64)
  IL_0020: stloc.1

The first loop in Release IL:

  // value = ToInt(ToDouble(i));
  IL_0012: ldloc.2
  IL_0013: call float64 ConsoleApp.Program::ToDouble(int32)
  IL_0018: call int32 ConsoleApp.Program::ToInt(float64)
  IL_001d: stloc.1

The second loop Debug:

  // double num = j;
  IL_0065: ldloc.s 5
  IL_0067: conv.r8
  IL_0068: stloc.s 6
  // value = (int)num;
  IL_006a: ldloc.s 6
  IL_006c: conv.i4
  IL_006d: stloc.1

The second loop Release:

  // value = (int)(double)j;
  IL_0054: ldloc.s 4
  IL_0056: conv.r8
  IL_0057: conv.i4
  IL_0058: stloc.1

Proc calls eat a lot of CPU ticks and this is the first thing to consider when optimizing, with loops and calculation.

The compiler optimization is mainly with the loop itself:

  • Debug:
// for (int i = 1; i <= 100000000; i++)
IL_0010: ldc.i4.1
IL_0011: stloc.2
// (no C# code)
IL_0012: br.s IL_0026
// loop start (head: IL_0026)
  //...
  // for (int i = 1; i <= 100000000; i++)
  IL_0022: ldloc.2
  IL_0023: ldc.i4.1
  IL_0024: add
  IL_0025: stloc.2
  // for (int i = 1; i <= 100000000; i++)
  IL_0026: ldloc.2
  IL_0027: ldc.i4 100000000
  IL_002c: cgt
  // (no C# code)
  IL_002e: ldc.i4.0
  IL_002f: ceq
  IL_0031: stloc.3
  IL_0032: ldloc.3
  IL_0033: brtrue.s IL_0014
// end loop
  • Release:
// for (int i = 1; i <= 100000000; i++)
IL_000c: ldc.i4.1
IL_000d: stloc.1
// (no C# code)
IL_000e: br.s IL_0020
// loop start (head: IL_0020)
  // ...
  // for (int i = 1; i <= 100000000; i++)
  IL_001c: ldloc.1
  IL_001d: ldc.i4.1
  IL_001e: add
  IL_001f: stloc.1
  // for (int i = 1; i <= 100000000; i++)
  IL_0020: ldloc.1
  IL_0021: ldc.i4 100000000
  IL_0026: ble.s IL_0010
// end loop

The loops in debug without the console.writeline(val):

  // value = ToInt(ToDouble(i));
  IL_0015: ldloc.2
  IL_0016: call float64 ConsoleApp.Program::ToDouble(int32)
  IL_001b: call int32 ConsoleApp.Program::ToInt(float64)
  IL_0020: stloc.1

  // double num = j;
  IL_0065: ldloc.s 5
  IL_0067: conv.r8
  IL_0068: stloc.s 6
  // value = (int)num;
  IL_006a: ldloc.s 6
  IL_006c: conv.i4
  IL_006d: stloc.1

  // value = k;
  IL_00b7: ldloc.s 8
  IL_00b9: stloc.1

  // nothing

The loops in release without the console.writeline(val):

  // ToInt(ToDouble(i));
  IL_0010: ldloc.1
  IL_0011: call float64 ConsoleApp.Program::ToDouble(int32)
  IL_0016: call int32 ConsoleApp.Program::ToInt(float64)
  IL_001b: pop

  // _ = (double)j;
  IL_004b: ldloc.3
  IL_004c: conv.r8
  IL_004d: pop

  // nothing

  // nothing
  • So your answer is that indeed an empty loop should take more time than the one with code in it? – Rotem Oct 30 '19 at 14:48
  • 2
    With optimization enabled, the timings will be identical. – Rotem Oct 30 '19 at 14:51
  • 3
    You can replace `chrono.Reset(); chrono.Start();` with a single `chrono.Restart();`. – itsme86 Oct 30 '19 at 14:52
  • 2
    This still doesn't address the elephant in the room - any meaningful profiling will want optimizations enabled, and because `val` is not used at all, the compiler will just delete that code and the two loops will be equivalent. – Rotem Oct 30 '19 at 14:54
  • @Rotem Answer updated. You will be suprised... so you may read the explaination at the end. –  Oct 30 '19 at 17:54