Efficency of assign-and-compare in the same statement in Smalltalk

Question

A previous SO question raised the issue about which idiom is better in time of execution efficency terms:

[ (var := exp) > 0 ] whileTrue: [ ... ]

versus

[ var := exp. 
  var > 0 ] whileTrue: [ ... ]

Intuitively it seems the first form could be more efficient during execution, because it saves fetching one additional statement (second form). Is this true in most Smalltalks?

Trying with two stupid benchmarks:

| var acc |
var := 10000.
[ [ (var := var / 2) < 0  ] whileTrue: [ acc := acc + 1 ] ] bench.

| var acc |
var := 10000.
[ [ var := var / 2. var < 0  ] whileTrue: [ acc := acc + 1 ] ] bench

Reveals no major differences between both versions.

Any other opinions?

Do you repeat that enough times to make a measurable benchmark? I don't know smalltalk (just benchmarking and assembly language), but it looks like your loop only runs log2(10000) times, which is very very short. (Or is that 1 time, because `10000/2 < 0` is false?). The first test is whether it time scales linearly with a repeat-count, if you have this inside a loop that runs this repeatedly. The second is whether time scales at all with `var`. IDK how compiled vs. interpreted the implementation is, but with a compile-time constant arg it could just optimize to to `acc += constant`. — Peter Cordes, Jul 07 '18 at 05:23
As in many other languages, I would think these two cases would likely compile/interpret to the same thing since the compiler or interpreter would be smart enough to realize that you're using the result of the assignment right after the assignment is being made. — lurker, Jul 08 '18 at 12:07
please don't forget to mark your questions answered when you are happy with an answer! For more see http://stackoverflow.com/help/someone-answers — tukan, Jul 09 '18 at 11:41

score 5 · Accepted Answer · answered Jul 07 '18 at 07:03

So the question is: What should I use to achieve a better execution time?

temp := <expression>.
temp > 0

or

(temp := <expression>) > 0

In cases like this one, the best way to arrive at a conclusion is to go down one step in the level of abstraction. In other words, we need a better understanding of what's happening behind the scenes.

The executable part of a CompiledMethod is represented by its bytecodes. When we save a method, what we are doing is compiling it into a series of low level instructions for the VM to be able to execute the method every time it is invoked. So, let's take a look at the bytecodes of each one of the cases above.

Since <expression> is the same in the same in both cases, let's reduce it drastically to eliminate noise. Also, let's put our code in a method so to have a CompiledMethod to play with

Object >> m
  | temp |
  temp := 1.
  temp > 0

Now, let's look CompiledMethod and its superclasses for some message that would show us the bytecodes of Object >> #m. The selector should contain the subword bytecodes, right?

...

Here it is #symbolicBytecodes! Now let's evaluate (Object >> #m) symbolicBytecodes to get:

pushConstant: 1
popIntoTemp: 0
pushTemp: 0
pushConstant: 0
send: >
pop
returnSelf

Note by the way how our temp variable has been renamed to Temp: 0 in the bytecodes language.

Now repeat with the other and get:

pushConstant: 1
storeIntoTemp: 0
pushConstant: 0
send: >
pop
returnSelf

The difference is

popIntoTemp: 0
pushTemp: 0

versus

storeIntoTemp: 0

What this reveals is that in both cases temp is read from the stack in different ways. In the first case, the result of our <expression> is popped into temp from the execution stack and then temp is pushed again to restore the stack. A pop followed by a push of the same thing. In the second case, instead, no push or pop happens and temp is simply read from the stack.

So the conclusion is that in the first case we will be generating two cancelling instructions pop followed by push.

This also explains why the difference is so hard to measure: push and pop instructions have direct translations into machine code and the CPU will execute them really fast.

Note however, that nothing prevents the compiler to automatically optimize the code and realize that in fact pop + push is equivalent to storeInto. With such an optimization both Smalltalk snippets would result in exactly the same machine code.

Now, you should be able to decide which form do you prefer. I my opinion such a decision should only take into account the programming style that you like better. Taking into consideration the execution time is irrelevant because the difference is minimal, and could be easily reduced to zero by implementing the optimization we just discussed. By the way, that would be an excellent exercise for those willing to understand the low level realms of the unparalleled Smalltalk language.

In VisualWorks both your code-examples compile to the same byte-code, so Leandro's suggestion to automatically optimize the code does in fact already take place in VisualWorks. As a result the decompilation of both statements results in the (var := exp) > 0. Personally I prefer to separate the statements as it is much easier to understand. Not directly related to this benchmarking question, but a general advice: please try to avoid while/for loops in Smalltalk, they're too hard to understand in general, compared to typical iteration methods like #do:, #detect:, #select: and #collect:. — Karsten, Jul 07 '18 at 13:23

Efficency of assign-and-compare in the same statement in Smalltalk

1 Answers1