7

I am used to do this:

do
    local a
    for i=1,1000000 do
        a = <some expression>
        <...> --do something with a
    end
end

instead of

for i=1,1000000 do
    local a = <some expression>
    <...> --do something with a
end

My reasoning is that creating a local variable 1000000 times is less efficient than creating it just once and reuse it on each iteration.

My question is: is this true or there is another technical detail I am missing? I am asking because I don't see anyone doing this but not sure if the reason is because the advantage is too small or because it is in fact worse. By better I mean using less memory and running faster.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Mandrill
  • 256
  • 2
  • 12
  • "Creating a local variable" is something _you_ do when you write source code. What happens at runtime is, I assume, as unknown to you as it is to me. (Okay, I have played with `luac -l` and read about the VM instruction set a bit.) – Tom Blodget Jun 27 '15 at 02:38

4 Answers4

11

Like any performance question, measure first. In a unix system you can use time:

time lua -e 'local a; for i=1,100000000 do a = i * 3 end'
time lua -e 'for i=1,100000000 do local a = i * 3 end'

the output:

 real   0m2.320s
 user   0m2.315s
 sys    0m0.004s

 real   0m2.247s
 user   0m2.246s
 sys    0m0.000s

The more local version appears to be a small percentage faster in Lua, since it does not initialize a to nil. However, that is no reason to use it, use the most local scope because it it is more readable (this is good style in all languages: see this question asked for C, Java, and C#)

If you are reusing a table instead of creating it in the loop then there is likely a more significant performance difference. In any case, measure and favour readability whenever you can.

Community
  • 1
  • 1
ryanpattison
  • 6,151
  • 1
  • 21
  • 28
  • 6
    Spot on. Readability is 100 times more important than _insignificant_ performance tweaks. – Steve Wellens Jun 27 '15 at 02:15
  • 1
    COLD RUN! Do `first && second` in shell few times and you'll see that there is no strict-faster. **1253**/1022, 1020/1022, 1023/1019, 1020/1021, 1022/1020, 1022/1022, 1028/1019, 1021/1021, 1021/1022, ... – user3125367 Jun 27 '15 at 15:11
  • Actually, there are in both cases two predefined slots in activation record and there is no difference in opcodes. – user3125367 Jun 27 '15 at 15:13
  • @user3125367 The opcodes only differ by an additional `LOADNIL` in the first since since `a` is default initialized to `nil`. The point is that the performance difference is *insignificant* and the second is better for readability. – ryanpattison Jun 27 '15 at 15:22
  • You can see it in my answer now (and check it yourself). – user3125367 Jun 27 '15 at 15:29
  • @user3125367 actually it's `luac` 5.2.3 generates the extra `LOADNIL` – ryanpattison Jun 27 '15 at 15:40
  • how to measure this on windows system? – Black Jun 30 '15 at 08:40
  • @EdwardBlack (I have not tried this) [Windows equivalent to UNIX time command](http://superuser.com/questions/228056/windows-equivalent-to-unix-time-command). – ryanpattison Jun 30 '15 at 12:54
6

I think there's some confusion about the way compilers deal with variables. From a high-level kind of human perspective, it feels natural to think of defining and destroying a variable to have some kind of "cost" associated with it.

Yet that's not necessarily the case to the optimizing compiler. The variables you create in a high-level language are more like temporary "handles" into memory. The compiler looks at those variables and then translates it into an intermediate representation (something closer to the machine) and figures out where to store everything, predominantly with the goal of allocating registers (the most immediate form of memory for the CPU to use). Then it translates the IR into machine code where the idea of a "variable" doesn't even exist, only places to store data (registers, cache, dram, disk).

This process includes reusing the same registers for multiple variables provided that they do not interfere with each other (provided that they are not needed simultaneously: not "live" at the same time).

Put another way, with code like:

local a = <some expression>

The resulting assembly could be something like:

load gp_register, <result from expression>

... or it may already have the result from some expression in a register, and the variable ends up disappearing completely (just using that same register for it).

... which means there's no "cost" to the existence of the variable. It just translates directly to a register which is always available. There's no "cost" to "creating a register", as registers are always there.

When you start creating variables at a broader (less local) scope, contrary to what you think, you may actually slow down the code. When you do this superficially, you're kind of fighting against the compiler's register allocation, and making it harder for the compiler to figure out what registers to allocate for what. In that case, the compiler might spill more variables into the stack which is less efficient and actually has a cost attached. A smart compiler may still emit equally-efficient code, but you could actually make things slower. Helping the compiler here often means more local variables used in smaller scopes where you have the best chance for efficiency.

In assembly code, reusing the same registers whenever you can is efficient to avoid stack spills. In high-level languages with variables, it's kind of the opposite. Reducing the scope of variables helps the compiler figure out which registers it can reuse because using a more local scope for variables helps inform the compiler which variables aren't live simultaneously.

Now there are exceptions when you start involving user-defined constructor and destructor logic in languages like C++ where reusing an object might prevent redundant construction and destruction of an object that can be reused. But that doesn't apply in a language like Lua, where all variables are basically plain old data (or handles into garbage-collected data or userdata).

The only case where you might see an improvement using less local variables is if that somehow reduces work for the garbage collector. But that's not going to be the case if you simply re-assign to the same variable. To do that, you would have to reuse whole tables or user data (without re-assigning). Put another way, reusing the same fields of a table without recreating a whole new one might help in some cases, but reusing the variable used to reference the table is very unlikely to help and could actually hinder performance.

  • 1
    Indeed. When I look at `local a; for i=1,100000000 do a = i * 3 end` I can see that it doesn't do anything at all. I know compiler writers are smarter than I am so I wouldn't be surprised if a C++ compiler would optimize something like that into to a NOP. – Tom Blodget Jun 27 '15 at 02:38
3

All local variables are "created" at compile (load) time and are simply indexes into function activation record's locals block. Each time you define a local, that block is grown by 1. Each time do..end/lexical block is over, it shrinks back. Peak value is used as total size:

function ()
    local a        -- current:1, peak:1
    do
        local x    -- current:2, peak:2
        local y    -- current:3, peak:3
    end
                   -- current:1, peak:3
    do
        local z    -- current:2, peak:3
    end
end

The above function has 3 local slots (determined at load, not at runtime).

Regarding your case, there is no difference in locals block size, and moreover, luac/5.1 generates equal listings (only indexes change):

$  luac -l -
local a; for i=1,100000000 do a = i * 3 end
^D
main <stdin:0,0> (7 instructions, 28 bytes at 0x7fee6b600000)
0+ params, 5 slots, 0 upvalues, 5 locals, 3 constants, 0 functions
        1       [1]     LOADK           1 -1    ; 1
        2       [1]     LOADK           2 -2    ; 100000000
        3       [1]     LOADK           3 -1    ; 1
        4       [1]     FORPREP         1 1     ; to 6
        5       [1]     MUL             0 4 -3  ; - 3       // [0] is a
        6       [1]     FORLOOP         1 -2    ; to 5
        7       [1]     RETURN          0 1

vs

$  luac -l -
for i=1,100000000 do local a = i * 3 end
^D
main <stdin:0,0> (7 instructions, 28 bytes at 0x7f8302d00020)
0+ params, 5 slots, 0 upvalues, 5 locals, 3 constants, 0 functions
        1       [1]     LOADK           0 -1    ; 1
        2       [1]     LOADK           1 -2    ; 100000000
        3       [1]     LOADK           2 -1    ; 1
        4       [1]     FORPREP         0 1     ; to 6
        5       [1]     MUL             4 3 -3  ; - 3       // [4] is a
        6       [1]     FORLOOP         0 -2    ; to 5
        7       [1]     RETURN          0 1

// [n]-comments are mine.

user3125367
  • 2,920
  • 1
  • 17
  • 17
2

First note this: Defining the variable inside the loop makes sure that after one iteration of this loop the next iteration cannot use that same stored variable again. Defining it before the for-loop makes it possible to carry an variable through multiple iterations, as any other variable not defined within the loop.

Further, to answer your question: Yes, it is less efficient, because it re-initiates the variable. If the Lua JIT- /Compiler has good pattern recognition, it might be that it just resets the variable, but I cannot confirm nor deny that.

vdMeent
  • 56
  • 5
  • Sorry I am confused. Which one are you referring to when you say "Yes, it is less efficient"? In both the variable value is redefined each iteration. – Mandrill Jun 27 '15 at 00:46
  • Sorry for the confusion. I was mentioning the in-loop declaration of variables when writing 'Yes, it is less efficient'. – vdMeent Jun 27 '15 at 00:50