The IL generated will be (for release build) pretty much identical for both approaches.
Consider this code:
static int test1()
{
int result = 0;
int i;
for (i = 0; i < 10; ++i)
++result;
for (i = 0; i < 10; ++i)
++result;
return result;
}
static int test2()
{
int result = 0;
for (int i = 0; i < 10; ++i)
++result;
for (int i = 0; i < 10; ++i)
++result;
return result;
}
This generates the following IL for a release build, which I have placed side-by-side for easier comparison:
test1(): test2()
{ {
.maxstack 2 .maxstack 2
.locals init ( .locals init (
[0] int32 result, [0] int32 result,
[1] int32 i) [1] int32 i,
[2] int32 V_2)
L_0000: ldc.i4.0 L_0000: ldc.i4.0
L_0001: stloc.0 L_0001: stloc.0
L_0002: ldc.i4.0 L_0002: ldc.i4.0
L_0003: stloc.1 L_0003: stloc.1
L_0004: br.s L_000e L_0004: br.s L_000e
L_0006: ldloc.0 L_0006: ldloc.0
L_0007: ldc.i4.1 L_0007: ldc.i4.1
L_0008: add L_0008: add
L_0009: stloc.0 L_0009: stloc.0
L_000a: ldloc.1 L_000a: ldloc.1
L_000b: ldc.i4.1 L_000b: ldc.i4.1
L_000c: add L_000c: add
L_000d: stloc.1 L_000d: stloc.1
L_000e: ldloc.1 L_000e: ldloc.1
L_000f: ldc.i4.s 10 L_000f: ldc.i4.s 10
L_0011: blt.s L_0006 L_0011: blt.s L_0006
L_0013: ldc.i4.0 L_0013: ldc.i4.0
L_0014: stloc.1 L_0014: stloc.2
L_0015: br.s L_001f L_0015: br.s L_001f
L_0017: ldloc.0 L_0017: ldloc.0
L_0018: ldc.i4.1 L_0018: ldc.i4.1
L_0019: add L_0019: add
L_001a: stloc.0 L_001a: stloc.0
L_001b: ldloc.1 L_001b: ldloc.2
L_001c: ldc.i4.1 L_001c: ldc.i4.1
L_001d: add L_001d: add
L_001e: stloc.1 L_001e: stloc.2
L_001f: ldloc.1 L_001f: ldloc.2
L_0020: ldc.i4.s 10 L_0020: ldc.i4.s 10
L_0022: blt.s L_0017 L_0022: blt.s L_0017
L_0024: ldloc.0 L_0024: ldloc.0
L_0025: ret L_0025: ret
} }
This makes it pretty clear that you should choose the version where 'i' is local to the loop, because that's better practice.
However, the version with the loop counter declared outside the loops will be faster by the amount of time needed to initialise an int to zero - pretty much negligible.