5

Is there a way to remove the array bounds check in C#?

here is what I want to achieve:

public static int F(int[] M, int i) 
{
    return M[i]; // I can guarantee that [i] will never be outside of [0, M.Length]
}

Before this function call I have a logic which already does check for the bounds (with some extra logic into it). The thing I want to remove are following lines:

Program.F(Int32[], Int32)
    L0000: sub rsp, 0x28
    L0004: cmp edx, [rcx+8]           ; I don't need this line
    L0007: jae short L0015            ; I don't need this line
    L0009: movsxd rax, edx
    L000c: mov eax, [rcx+rax*4+0x10]
    L0010: add rsp, 0x28
    L0014: ret
    L0015: call 0x00007ffc8877bc70    ; I don't need this line
    L001a: int3                       ; I don't need this line

Question

Is there a way of removing those instructions?

Note

  • I tried to put an if check with a hope that the compiler will get that but it made the situation even worse.
public static int G(int[] M, int i) 
{
    if (i >= 0 && i < M.Length)
        return M[i];

    return -1;
}

this generates:

Program.G(Int32[], Int32)
    L0000: sub rsp, 0x28
    L0004: test edx, edx
    L0006: jl short L001f
    L0008: mov eax, [rcx+8]
    L000b: cmp eax, edx
    L000d: jle short L001f
    L000f: cmp edx, eax
    L0011: jae short L0029
    L0013: movsxd rax, edx
    L0016: mov eax, [rcx+rax*4+0x10]
    L001a: add rsp, 0x28
    L001e: ret
    L001f: mov eax, 0xffffffff
    L0024: add rsp, 0x28
    L0028: ret
    L0029: call 0x00007ffc8877bc70
    L002e: int3

as you can see it didn't help.

  • What I can do is: using unsafe:
public static unsafe int H(int* M, int i) 
{
    return M[i];
}

this generates what I was looking for:

Program.H(Int32*, Int32)
    L0000: movsxd rax, edx
    L0003: mov eax, [rcx+rax*4]
    L0006: ret

But I sadly can't enable unsafe for my project. Is there a solution in "non-unsafe" world?

  • 1
    Have you profiled the code and concluded that the bounds check actually slows things down considerably? https://stackoverflow.com/questions/16713076/array-bounds-check-efficiency-in-net-4-and-above – trenki Apr 04 '21 at 08:38
  • @trenki yep, the `unsafe` version is faster, than the regular one. But as I said It'll be hard for me to enable unsafe in my project. It is also hard to include the benchmark because it has lot of dependencies and clearing them out and including them into my question would take too much time (+ I don't think that changing the code would give us accurate results here). –  Apr 04 '21 at 08:43
  • Does this answer your question? [Array bounds check efficiency in .net 4 and above](https://stackoverflow.com/questions/16713076/array-bounds-check-efficiency-in-net-4-and-above) – tymtam Apr 04 '21 at 08:58
  • Such short method gets inlined during compilation and there's no overhead [SharpLab](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABABgAJiBGAOgCUBXAOwwEt8YaBhCfAB1YAbGFADKIgG6swMXAG4AsAChlAelUBtALIwMACwgATAJL9BACh36jpvoIDyfNhCa4aAQQDmn2LlysJGGMmQVYmMM8ASgBdNU0rAxMzS11E2wcnVhc3ADkIYNDwpijYpWokcjCMcgAxcyqNaPItNEqWSsjyZQBvZXJ+ygB2Zo1WaMUlAF9lZXK26tEGfDqGpuwoSJ6+garyXCXyAF5yUgmB8gAzaHJ69tYjk7lK8gAecnWaABkYYv0n1gA1ADNkpzr1QecBvt8OQAcc6utWqxImcBtMIf1iMNoRN0TMylQKrtFvgACKsWBgDC3DCNd4bLYY+Z7A7HU7bfpXKA3Xb3Nn/V70r4/Tx/SpAkFgjmQ6Gw47rUbjaXo85Yln4XFAA==). – JL0PD Apr 04 '21 at 09:15
  • @JL0PD yes, but you still have the bounds check there. Am I wrong? Not in your example maybe (I think because you specified the Length of array), but if you replace arr.Length with e.g. 10 then you will get the check there. If I'm not wrong. –  Apr 04 '21 at 09:17
  • I think I have to try to enable the `unsafe` mode. Anyway, thank you all. –  Apr 04 '21 at 09:21
  • @Hrant, if you pass length as argument, than it's worth to try use indexing `for (int i = length; i > 0; i++)`. In this case JIT will check only on first access, later going to have no bound checks [sharplab](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABABgAJiBGAOgCUBXAOwwEt8YaBhCfAB1YAbGFADKIgG6swMXAG4AsAChlAelUBtALIwMACwgATAJL9BACh36jpvoIDyfNhCa4aAQQDmn2LlysJGGMmQVYmMM8ASgBdNU0rAxMzS11E2wcnVhc3ADkIYNDwpijYpWokcjCMcgAxcyqNaPItNEqWSsjyZQBvZXJ+ygB2Zo1WaMUlAF9lZXK26tEGfDqGpuwoVqryYWL9SJ6+ga3cJfIAXnJSCYHyADNocnr21nPtmF29OUryAD5Lr9YcDg+yUN16oJuAxO+HIAGoLnV1ptItcBtMIf1iMNoRN0UA===) – JL0PD Apr 04 '21 at 09:26
  • @JL0PD maybe I'm wrong, but I think it does check it every time as you can see the line `jg short L000e` jumps to `L000e` and in the next line we have `L0011: jae short L0026` which is the "bounds check". Correct me if I'm wrong. –  Apr 04 '21 at 09:32
  • 1
    @Hrant. You're correct. I'm not used to read asm so didn't catch it. Done some benchmarking 1_000_000 ints: direct sum `0 .. ar.Length` took 803μs, fsum `length - 1 .. 0` 891μs and without check optimizations `0 .. length` 918μs on my PC – JL0PD Apr 04 '21 at 10:01
  • I thought I'd read that current C# JITs could hoist a bounds-check out of a loop. So you still get one, instead of one-per-iter. Is that not the case? Looking at the stand-alone code for a function that does one array indexing operation is probably not the place to optimize, instead it probably makes more sense to look at a loop. (Unless your real use-case involves some kind of function-pointer or polymorphism or other stuff, or if the indices aren't from a counted loop that would make the checks easy to optimize out.) – Peter Cordes Apr 04 '21 at 17:16

3 Answers3

6

Actually there's the way. Stumbled upon it in csFastFloat repository.

Idea here is to use MemoryMarshall.GetArrayDataReference to get reference to first item in array and then add shift to get actual value:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
static T FastAccessValue<T>(T[] ar, int index)
{
       ref T tableRef = ref MemoryMarshal.GetArrayDataReference(ar);
       return Unsafe.Add(ref tableRef, (nint)index);
}

which is safe(?) equivalent of unsafe version

[MethodImpl(MethodImplOptions.AggressiveInlining)]
static unsafe T FastAccessValueUnsafe<T>(T[] ar, int index) where T : unmanaged
{
     fixed(T* ptr = ar)
     {
         return ptr[index];
     }
}

without limitation to only unmanaged structs.

With unsafe access it's even performs 10% faster on large data (over million items)

public int SumUnsafe(int[] ints, int length)
{
    int sum = 0;
    for (int i = 0; i < length; i++)
    {
        sum += FastAccessValue(ints, i);
    }
    return sum;
}
public int SumDirect(int[] ints, int length)
{
    int sum = 0;
    for (int i = 0; i < ints.Length; i++)
    {
        sum += ints[i];
    }
    return sum;
}
Method ints length Mean Error StdDev Code Size
SumDirect Int32[100000] 100000 80.13 μs 0.748 μs 0.700 μs 29 B
SumUnsafe Int32[100000] 100000 81.99 μs 0.535 μs 0.446 μs 33 B
SumDirect Int32[1000000] 1000000 854.73 μs 5.216 μs 4.624 μs 29 B
SumUnsafe Int32[1000000] 1000000 795.10 μs 2.680 μs 2.238 μs 33 B
SumDirect Int32[10000000] 10000000 10,104.72 μs 27.199 μs 22.712 μs 29 B
SumUnsafe Int32[10000000] 10000000 9,126.06 μs 30.329 μs 26.886 μs 33 B

Benchmark located in this gist

JL0PD
  • 3,698
  • 2
  • 15
  • 23
  • But slower over smaller array sizes, and also large code-size? What asm does they compile to? Doing the addressing mode manually? – Peter Cordes Apr 06 '21 at 05:11
  • 1
    @PeterCordes It adds two dead-end operations: `cmp` and `mov` https://pastebin.com/dzRJdLR4 – JL0PD Apr 06 '21 at 05:19
  • I think you forgot to use your second param `length` in your loop. It'll make a difference as discussed in comments. –  Apr 06 '21 at 05:22
  • @Hrant It was intentional to show difference between *ideal* for JIT code and unsafe one – JL0PD Apr 06 '21 at 05:23
  • @JL0PD Uh ok. I see. –  Apr 06 '21 at 05:24
  • Ok, neither version has any bounds-checking inside the loop, only for checking for positive length once before the loop. But your point is that `SumDirect` would have bounds checking if you used the separate `length` arg instead of the container's `ints.Length`. – Peter Cordes Apr 06 '21 at 05:33
  • I suspect your speedup for large arrays is just a side-effect of testing order, e.g. the first loop has to pay for page faults? [Idiomatic way of performance evaluation?](https://stackoverflow.com/a/60293070) For the versions you show, `SumUnsafe` has no reason to be faster, and your testing shows it isn't for small arrays. It's the same asm with a really weird stray `cmp`, and re-copying the array base address before use in the [inefficient indexed addr mode](https://stackoverflow.com/questions/26046634). – Peter Cordes Apr 06 '21 at 05:40
  • 1
    @PeterCordes if he would've used the param `length` (which was my original example) then we should get more efficient behavior here. Correct me if I'm wrong. –  Apr 06 '21 at 05:50
  • @PeterCordes. I've lost any belief in prediction of performance by looking at ASM after I've found that storing value in temporary variable before using it increases performance 1.5x and this happens only on Intel [sample](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABARiQAIBLAOwwoGUBXfAClowG0BdC7KAJQBYAFABvURSnU6FXCwoBeCgAYA3JOkAzaBXayqS1WuoUAPHygA6ADIwaAcwwALE1QDU74SOkUJP32l5fAp3ZX5OKm4NAKkAX00pYgB2ORYYhJFRMkoORhYAdSoXADV+KmxgABsYfS5efm9/Xzzgo3VEih0oPTzDZXVTC35beydXak9vX2bAqTyANyMIqJi5qTawigW16UzfFLT8DKA==) – JL0PD Apr 06 '21 at 06:03
  • @JL0PD how is the second function faster when we have an additional `MOV` there? –  Apr 06 '21 at 06:10
  • @Hrant Yeah, sharplab confirms you get a bounds-check inside the loop if you use the `len` arg. The C# JIT also seems much dumber than an ahead-of-time compiler; it makes no sense that using an `int v = ar[i];` temporary would make it `mov`-load into a register instead of memory-source `add`. Whichever one is better, it should always do that. If that causes a 1.5x performance effect, that's probably specific to something going on in that loop. Possibly avoiding having the `cmp/jcc` touch a 32-byte boundary, avoiding falling afoul of the JCC erratum workaround on Skylake. – Peter Cordes Apr 06 '21 at 06:10
  • @JL0PD: Obviously all these loops are kinda dumb, using an `int` index that has to get sign-extended to pointer width every iteration in case it overflows and wraps to negative. Hopefully C# would do better with an index using C#'s equivalent of `size_t`. – Peter Cordes Apr 06 '21 at 06:11
  • I really wish we had an AoT compiler instead of JIT. –  Apr 06 '21 at 06:13
  • @PeterCordes we don't have anything like `size_t`. The `sizeof` operator in `C#` returns an `int`. I guess you could use the new `nint` which stands for native integer. –  Apr 06 '21 at 06:22
  • @Hrant: So you can't avoid loops that sign-extend a counter every iteration? That's horrible. Or is there some kind of range-for syntax to loop over a container without a specific index variable, like C++ `for (auto &v : my_container) { sum += v; }`? – Peter Cordes Apr 06 '21 at 06:24
  • @PeterCordes there is `foreach`, but under the hood it converts to regular `for`. I think it's better for us to move to conversations, but I don't know how to create one – JL0PD Apr 06 '21 at 06:32
  • @PeterCordes we got the `foreach` keyword which does that, but I'm not a big fan of it, because it works only for arrays (basically when the compiler sees an array inside `foreach` then it converts it to a `for` loop). For non-array scenarios: It iterates in "linked list fashion" (no indexing is involved only pointers and it iterates until the last pointer which points to NULL). I'm not an expert, correct me if I'm wrong. –  Apr 06 '21 at 06:32
  • @Hrant: ugh, even that doesn't help. [Sharplab shows](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABARiQAIBLAOwwoGUBXfAClowG0BdC7KAJQBYAFABvURSnU6FXCwoBeCgAYA3JOkAzaBXayqS1WuoUAPHygA6ADIwaAcwwALE1QDU74SOkUJP32l5fAp3ZX5OKm4NAKkAX00pYgB2ORYYhJFRMkoORhYAdSoXADV+KmxgABsYfS5efm9/Xzzgo3VEih1YbDBnPTyANxlLb19mwKk2sIpBmN9M3xS0/AygA) that you still get movsxd; and a separate `mov` load, exactly like the `int v = ar[i];` loop with an int counter, including `inc edx` not `inc rdx`. (Which is presumably how it's treated internally.) – Peter Cordes Apr 06 '21 at 06:37
  • @PeterCordes yeah, we don't have C/C++ like optimizations. :c –  Apr 06 '21 at 06:40
1

According to this blog:

https://devblogs.microsoft.com/dotnet/performance_improvements_in_net_7/#bounds-check-elimination

PGO feature in .net 7 greatly optimises boundary checks. However I doubt anyone can figure how would it work in your case in theory because I think its non-deterministic.

I would set the following params and then benchmark with and without:

DOTNET_TieredPGO=1
DOTNET_ReadyToRun=0
Boppity Bop
  • 9,613
  • 13
  • 72
  • 151
0

Just checked these two patterns on .NET 8.0:

int G1(int[] M, int i)
{
    return (uint)i < M.Length ? M[i] : -1;
}

int G2(int[] M, int i)
{
    if (i >= 0 && i < M.Length)
        return M[i];
    return -1;
}

and the codegen doesn't contain bound checks:

; Method Foo:G1(int[],int):int:this (FullOpts)
       mov      eax, dword ptr [rdx+08H]
       cmp      eax, r8d
       ja       SHORT G_M59182_IG05
       mov      eax, -1
       ret      
G_M59182_IG05:
       mov      eax, r8d
       mov      eax, dword ptr [rdx+4*rax+10H]
       ret      
; Total bytes of code: 22


; Method Foo:G2(int[],int):int:this (FullOpts)
       test     r8d, r8d
       jl       SHORT G_M35821_IG05
       mov      eax, dword ptr [rdx+08H]
       cmp      eax, r8d
       jle      SHORT G_M35821_IG05
       mov      eax, r8d
       mov      eax, dword ptr [rdx+4*rax+10H]
       ret      
G_M35821_IG05:
       mov      eax, -1
       ret      
; Total bytes of code: 27
EgorBo
  • 6,120
  • 3
  • 35
  • 40