What is the inefficiency in this cairo code using alloc_locals

Question

The following code:

func pow4(n) -> (m : felt):
    alloc_locals
    local x

    jmp body if n != 0
    [ap] = 0; ap++
    ret

    body:
    x = n * n
    [ap] = x * x; ap++
    ret
end

func main():
    pow4(n=5)
    ret
end

is declared inefficient in the doc because of non-continuous memory.

I ran it and could not see any hole in the memory table:

Addr  Value
-----------
⋮
###
⋮
1:0   2:0
1:1   3:0
1:2   5
1:3   1:2
1:4   0:21
1:5   25
1:6   625

so I don't understand where the problem is. I do see a hole with n=0 though:

Addr  Value
-----------
⋮
###
⋮
1:0   2:0
1:1   3:0
1:2   0
1:3   1:2
1:4   0:21
⋮
1:6   0

that I can fix using:

    jmp body if n != 0
    x = 0
    ret

but it's not what's suggested:

Move the instruction alloc_locals.
or Use tempvar instead of local.

score 3 · Answer 1 · answered Jun 16 '22 at 17:03

You are correct that the inefficiency is due to the memory hole, and correct that it appears only for n=0. Holes do not cause an inefficiency just by existing, but rather, their existence usually means that an equivalent code could have executed using fewer memory cells in total (in this case, 6 instead of 7, in the memory section "1:"). To make it more efficient, one should aim to remove the hole (so that the parts before and after it become continuous), whereas your suggested solution just fills it (which the prover does anyway). So your solution would still use 7 memory cells in the memory segment; and will in fact also use 2 additional cells in the program section (for the x=0 command), so it is actually less efficient than leaving the hole empty: compile and check!

The inefficiency in the original code arises from the local variable x being assigned a memory cell even in the case n=0, despite not being used. To make it more efficient we would simply not want to declare and allocate x in this case. This can be done by moving the alloc_locals command inside "body", so that it isn't executed (and the locals aren't allocated) in the n=0 case, as in the first suggestion: this saves one memory cell in the n=0 case and doesn't affect the n!=0 case. Note that alloc_locals does not have to appear right at the beginning of the code.

The second suggestion is to make x a tempvar instead of a local (and remove alloc_locals entirely). This again means no local variable is allocated in the n=0 case, thus again only 6 memory cells will be used instead of 7. And in fact, because the code is so simple, using the tempvar is also more efficient that declaring it as a local, at least when used as tempvar x = n * n, as it skips merges the ap += 1 command (which is what alloc_locals does in this case) with the x = n * n command, rather than run them separately.

Beyond discussing the theory, you should compile each of these options and compare the lengths of the code and memory segments obtained, and see what is really the most efficient and by how much: this is always true when optimizing, you should always check the actual performance.

Fun fact: In early versions of Cairo, memory holes were not merely inefficiencies -- they wouldn't compile at all, and the prover could not generate a proof for them. That's part of the reason they are given this much weight in the tutorials, even though they are not that big a deal today. If this were still the case, then your suggested solution would indeed have fixed this program for the n=0 case (i.e. made it compile when it otherwise wouldn't have), just not in the most efficient way possible. — Dan Carmon, Jun 16 '22 at 17:09
thanks for the clarifications. Actually compiling my solution brings the same result as the one suggested because I'm reusing the slot allocated when n=0 and don't ap++. I'll post all the options with their memory trace in another answer — ClementWalter, Jun 17 '22 at 09:05
Ah, sorry, I misread your code, I thought that you had the ```x=0``` part in addition to the ```[ap]=0; ap++``` line, I didn't notice you completely replaced it. So of course it is just as efficient, as you observed. That's a pretty neat fix! But the downside is that it lacks readability: it is not obvious that the local variable x happens to be at the location of the return value, [ap - 1] (and for more complicated logic would certainly not have been the case). Probably something to be avoided in production code :) — Dan Carmon, Jun 17 '22 at 10:45

score 1 · Answer 2 · answered Jun 17 '22 at 09:26

So following @dan-carmon answer and for the sake of completeness, here is a summary of the possible implementations and their memory table "1" when n = 0 & n > 0

# n = 0      n = 5
1:0   2:0    1:0   2:0
1:1   3:0    1:1   3:0
1:2   0      1:2   5
1:3   1:2    1:3   1:2
1:4   0:14   1:4   0:14
1:5   0      1:5   25
             1:6   625

Note however that the implementation using tempvar uses 2 slots less than the others in the program table "0".

Implementations tested

func pow4_reuse_slot(n) -> (m : felt):
    alloc_locals
    local x

    jmp body if n != 0
    x = 0
    ret

    body:
    x = n * n
    [ap] = x * x; ap++
    ret
end

func pow4_alloc_in_body(n) -> (m : felt):
    jmp body if n != 0
    [ap] = 0; ap++
    ret

    body:
    alloc_locals
    local x
    x = n * n
    [ap] = x * x; ap++
    ret
end

func pow4_use_tempvar(n) -> (m : felt):
    jmp body if n != 0
    [ap] = 0; ap++
    ret

    body:
    tempvar x = n * n
    [ap] = x * x
    ret
end

What is the inefficiency in this cairo code using alloc_locals

2 Answers2

Implementations tested