4

I have a call site cache comprised of an array of this CachePage class. The cache page has a readonly array of tokens that are used to determine if this page is the correct one to use.

internal class CachePage
{
    internal CachePage(Token[] tokens)
    {
        this.tokens = tokens;
    }
    private readonly Token[] tokens;
    // other members don't matter...

To simplify matters, On the class I have several CheckTokens method that take various number of parameters they WLOG all look like this.

public bool CheckTokens(Token g0, Token g1, Token g2)
{
    return (g2 == tokens[2] && g1 == tokens[1] && g0 == tokens[0]);
}

I iterate backward through the elements of the array to incur only one bound check. Or so I thought. When I look at the output of the disassembly, I actually see for each comparison it actually is performing a boundcheck.

However, if I change the method like so,the extra boundchecks are eliminated.

public bool CheckTokens(Token g0, Token g1, Token g2)
{
    var t=tokens;
    return (g2 == t[2] && g1 == t[1] && g0 == t[0]);
}

Why are the extra bound checks added? Doesn't the readonly flag tell the JIT that this private variable cannot be mutated?

This is a low-level cache so the less time spent determining if this is the right path the better.

EDIT: This is for 64bit .net 4.5 tried with both ryuJIT and normal JIT.

Michael B
  • 7,512
  • 3
  • 31
  • 57
  • Please state CLR version, and bitness and if using RyuJIT :) – leppie Jul 17 '14 at 15:15
  • 2
    No, the bounds check does *not* get eliminated in the x86 and the x64 jitter. That would require an optimizer that can remember state across sub-expressions, an optimization that costs space and time that a jitter cannot afford. Avoid confusing it with the jitter caching the value of the array's Length property, easy to do in the x64 jitter since it has enough registers. – Hans Passant Jul 17 '14 at 15:43
  • @HansPassant,From inspecting the output disassembly it only does one comparison against the cached length when I use a local variable, thus 4 total comparisons in this example case. When I rely on a field reference it performs 6 comparisons. I'm no expert in amd64 assembly admittedly. – Michael B Jul 17 '14 at 16:22

1 Answers1

1

I work in the world of real-time trading and looked into this whilst trying to squeeze every last bit of performance out of an application.

As Hans said in the question comments, there are some optimisations that the JITters choose to ignore. One of these is enregistering readonly member variables which are read multiple times within a method. Whilst conceptually this would seem a simple one to implement, recognising that the same member variable is read multiple times is not trivial.

Put simply, it's a tradeoff and in the vast majority of cases the potential benefit isn't worth the cost to the JITter. If your performance tuning has got to the point of micro optimisations then explicitly enable the JITter to enregister a member variable by manually lifting it into a local variable. This has the knock on effect of enabling the JITter to perform other optimisations, such as eliminating bounds checks.

0b101010
  • 756
  • 6
  • 15