15

I'm writing a computationally intensive program with VB.NET 2010 and I wish to optimise speed. I find that the operators AndAlso and OrElse are anomalously slow if the result of the operation is assigned to a class-level variable. For example, while the statements

a = _b AndAlso _c  
_a = a  

take about 6 machine cycles between them in the compiled exe, the single statement

_a = _b AndAlso _c  

takes about 80 machine cycles. Here _a, _b and _c are Private Boolean variables of Form1, and the statements in question are in an instance procedure of Form1, of which a is a local Boolean variable.

I cannot find why the single statement takes so long. I have explored it using NetReflector down to the level of the CIL code, which looks good:

Instruction               Explanation                              Stack  
00: ldarg.0               Push Me (ref to current inst of Form1)   Me  
01: ldarg.0               Push Me                                  Me, Me  
02: ldfld bool Form1::_b  Pop Me, read _b and push it              _b, Me  
07: brfalse.s 11          Pop _b; if false, branch to 11           Me  
09: ldarg.0               (_b true) Push Me                        Me, Me  
0a: ldfld bool Form1::_c  (_b true) Pop Me, read _c and push it    _c, Me  
0f: brtrue.s 14           (_b true) Pop _c; if true, branch to 14  Me  
11: ldc.i4.0              (_b, _c not both true) Push result 0     result, Me  
12: br.s 15               Jump unconditionally to 15               result, Me  
-----  
14: ldc.i4.1              (_b, _c both true) Push result 1         result, Me  
15: stfld bool Form1::_a  Pop result and Me; write result to _a    (empty)  
1a:

Can anyone shed any light on why the statement _a = _b AndAlso _c takes 80 machine cycles instead of the predicted 5 or so?

I'm using Windows XP with .NET 4.0 and Visual Studio Express 2010. I measured the times with a frankly dirty snippet of my own which basically uses a Stopwatch object to time a For-Next loop with 1000 iterations containing the code in question and compare it with an empty For-Next loop; it includes one useless instruction in both loops to waste a few cycles and prevent processor stalling. Crude but good enough for my purposes.

Eric P Smith
  • 163
  • 7
  • Sorry if this comment seems off to some, but if I was looking for computational speed and measuring time in cycles -- I probably would not be using VB.NET or .NET in general. – TyCobb Jan 21 '15 at 17:31
  • maybe if you post more of the code, we can offer other suggestions for improving efficiency. – Jeremy Jan 21 '15 at 17:37
  • Have you tried using just AND? Especially since you are working with Booleans variable. – the_lotus Jan 21 '15 at 17:40
  • 1
    @the_lotus Wouldn't that actually be slower? `And` evaluates both sides. – TyCobb Jan 21 '15 at 17:45
  • @TyCobb I understand. I'm a hobby programmer, age 65, I know VB.Net thoroughly and I haven't (at present) the appetite to learn C++ – Eric P Smith Jan 21 '15 at 17:52
  • 1
    @TyCobb if the two side are just Boolean variable, then doing a binary AND might be faster than having to evaluate if one or both side have to be taken into account. – the_lotus Jan 21 '15 at 17:54
  • 2
    @Jeremy: I deliberately stripped the code to the smallest that makes my point. I can work-around easily, but I like to understand the tools I use. – Eric P Smith Jan 21 '15 at 17:56
  • My only theory would be some strangeness with assigning values to the heap, where object allocated memory is stored, versus assigning values to the stack, where function declared variables are stored (or so they have told us). Beyond that, it will take an IL wizard, and someone who knows the deep magic underlying .NET, the runtime optimizer, and the VM to fully understand what is going on. – Jeremy Jan 21 '15 at 18:06
  • The execution times may vary also if the code is in debug mode instead of release. – David - Jan 21 '15 at 20:47

1 Answers1

12

There are two factors at play here that make this code slow. You cannot see this from the IL, only the machine code can give you insight.


First is the general one associated with the AndAlso operator. It is a short-circuiting operator, the right-hand side operand does not get evaluated if the left-hand side operand evaluates to False. This requires a branch in the machine code. Branching is one of the slowest thing a processor can do, it must guess at the branch up front to avoid the risk of having to flush the pipeline. If it guesses wrong then it will take a major perf hit. Very well covered in this post. The typical perf loss if the a variable is highly random, and the branch therefore poorly predicted, is around 500%.

You avoid this risk by using the And operator instead, it doesn't require a branch in the machine code. It is just a single instruction, AND is implemented by the processor. There is no point in favoring AndAlso in an expression like that, nothing goes wrong if the right-hand side operand gets evaluated. Not applicable here, but even if the IL shows a branch then the jitter might still make the machine code branch-less with a CMOV instruction (conditional move).


But most significant in your case is that the Form class inherits from the MarshalByRefObject class. The inheritance chain is MarshalByRefObject > Component > Control > ScrollableControl > ContainerControl > Form.

MBRO is treated specially by the Just-in-Time compiler, the code might be working with a proxy for the class object with the real object living in another AppDomain or another machine. A proxy is transparent to the jitter for almost any kind of member of the class, they are implemented as simple method calls. Except fields, they cannot be proxied because access to a field is done with a memory read/write, not a method call. If the jitter cannot prove that the object is local then it is forced to call into the CLR, using helper methods named JIT_GetFieldXxx() and JIT_SetFieldXxx(). The CLR knows whether the object reference is a proxy or the real deal and deals with the difference. The overhead is quite substantial, 80 cycles sounds about right.

There is not much you can do about this as long as the variables are members of your Form class. Moving them into a helper class is the workaround.

Community
  • 1
  • 1
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Nice! Thank you for your smartness :) – Jeremy Jan 21 '15 at 20:31
  • 3
    Wonderful! Thank you so much. When I move the variables to a helper class the instruction takes an average of 3½ cycles instead of 80. – Eric P Smith Jan 21 '15 at 20:41
  • 2
    Nice result. Q+A can't get better than this. Please update your question and spend a few words on how you measured, not enough programmers do this. – Hans Passant Jan 22 '15 at 13:24