18

We already know that Function arguments used to have the limit of 255 explicitly passed arguments. However, this behaviour is changed now and since Python-3.7 there's no limit except sys.maxsize which is actually the limit of python's containers. But what about the local variables?

We basically cannot add local variables to a function in a dynamic manner and/or changing the locals() dictionary is not permitted directly so that one can even test this in a brute force way. But the problem is that even if you change the locals() using compile module or exec function it doesn't affect the function.__code__.co_varnames, hence, you cannot access the variables explicitly inside the function.

In [142]: def bar():
     ...:     exec('k=10')
     ...:     print(f"locals: {locals()}")
     ...:     print(k)
     ...:     g = 100
     ...:     
     ...:     

In [143]: bar()
locals: {'k': 10}
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-143-226d01f48125> in <module>()
----> 1 bar()

<ipython-input-142-69d0ec0a7b24> in bar()
      2     exec('k=10')
      3     print(f"locals: {locals()}")
----> 4     print(k)
      5     g = 100
      6 

NameError: name 'k' is not defined

In [144]: bar.__code__.co_varnames
Out[144]: ('g',)

This means that even if you use a for loop like:

for i in range(2**17):
    exec(f'var_{i} = {i}')

The locals() will be contain 2**17 variables but you cannot do something like print(var_100) inside the function.

We know that basically there is no need to dynamically add a variable to the function while you can use a dictionary or in other words a custom namespace. But what's the proper way to test the limit of the maximum number of local variables in a function?

user2357112
  • 260,549
  • 28
  • 431
  • 505
Mazdak
  • 105,000
  • 18
  • 159
  • 188

2 Answers2

12

2^32. The LOAD_FAST op used for loading local variables only has a 1-byte or 2-byte oparg depending on the Python version, but this can and will be extended up to 4 bytes by one or more EXTENDED_ARG ops, allowing access to 2^32 local variables. You can see some of the helpers used for EXTENDED_ARG in Python/wordcode_helpers.h. (Note that the opcode documentation for EXTENDED_ARG in the dis docs hasn't yet been updated to reflect the new Python 3.6 wordcode structure.)

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
user2357112
  • 260,549
  • 28
  • 431
  • 505
6

About the exec() and its behavior with locals, there is already an open debate here: How does exec work with locals?.

Regarding the question, it seems practically impossible to test that by dynamically adding variables to the local namespace that is shared with function's __code__.co_varnames. And the reason is that this is restricted to code that is byte-compiled together. This is the same behavior that functions like exec and eval are bounded to in other situations such as executing codes contain private variables.

In [154]: class Foo:
     ...:     def __init__(self):
     ...:         __private_var = 100
     ...:         exec("print(__private_var)")

In [155]: f = Foo()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-155-79a961337674> in <module>()
----> 1 f = Foo()

<ipython-input-154-278c481fbd6e> in __init__(self)
      2     def __init__(self):
      3         __private_var = 100
----> 4         exec("print(__private_var)")
      5 
      6 

<string> in <module>()

NameError: name '__private_var' is not defined

Read https://stackoverflow.com/a/49208472/2867928 for more details.

However, this doesn't mean that we can't find out the limit in theory.i.e By analyzing the way that python stores the local variables in memory.

The way that we can do this is to first look at the bytecodes of a function and see how respective instructions are stored in memory. The dis is a great tool for disassembling a Python code, which in case we can disassemble a simple function as following:

>>> # VERSIONS BEFORE PYTHON-3.6
>>> import dis
>>> 
>>> def foo():
...     a = 10
... 
>>> dis.dis(foo)
  2           0 LOAD_CONST               1 (10)
              3 STORE_FAST               0 (a)
              6 LOAD_CONST               0 (None)
              9 RETURN_VALUE

Here the most left number is the number of line in which the code is stored. The column of numbers after it is the offsets of each instruction in the bytecode.

The STOR_FAST opcode stores TOS (top of stack) into the local co_varnames[var_num]. And since the difference of its offset with its next opcode is 3 (6 - 3) it means that each STOR_FAST opcode only occupies 3 bytes of the memory. The first byte is to store the operation or byte code; the second two bytes are the operand for that byte code which means that there are 2^16 possible combinations.

Therefore, in one byte_compile, theoretically a function can only have 65536 local variables.

After Python-3.6 the Python interpreter now uses a 16-bit wordcode instead of bytecode. Which is actually aligning the instructions to always be 2 bytes rather than 1 or 3 by having arguments only take up 1 byte.

So if you do the disassembling in later versions you'll get the following result which still uses two bytes for STORE_FAST.:

>>> dis.dis(foo)
  2           0 LOAD_CONST               1 (10)
              2 STORE_FAST               0 (a)
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE

However, @Alex Hall showed in comment that you can exec a whole function with more than 2^16 variables that makes them also available in __code__.co_varnames. But still this doesn't mean that it's practically feasible to test the hypothesis (because if you try to test with powers more than 20 it'll get exponentially more and more time consuming). However, here is the code:

In [23]: code = '''
    ...: def foo():
    ...: %s
    ...:     print('sum:', sum(locals().values()))
    ...:     print('add:', var_100 + var_200)
    ...: 
    ...: ''' % '\n'.join(f'    var_{i} = {i}'
    ...:                 for i in range(2**17))
    ...:                 
    ...:                 
    ...:                 

In [24]: foo()
sum: 549755289600
add: 300

In [25]: len(foo.__code__.co_varnames)
Out[25]: 1048576

This means that although STORE_FAST uses 2 bytes for preserving the TOS and "theoretically" can't preserve more than 2^16 different variables, there should be some other unique identifier, like the offset number, or extra space that makes it possible to preserve more than 2^16. And as it turned out it's EXTENDED_ARG that as it's mentioned in documentation it prefixes any opcode which has an argument too big to fit into the default two bytes. Therefore it's 2^16 + 16 = 2^32.

EXTENDED_ARG(ext)¶

Prefixes any opcode which has an argument too big to fit into the default two bytes. ext holds two additional bytes which, taken together with the subsequent opcode’s argument, comprise a four-byte argument, ext being the two most-significant bytes.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • Perhaps I've misunderstood, but I can construct and execute a function with 2^17 local variables: https://repl.it/repls/PleasantMediocreCharacterencoding – Alex Hall May 12 '18 at 15:02
  • @AlexHall Those variables should exist in `__code__.co_varnames`. Please check out the update in question for clarifications. – Mazdak May 12 '18 at 16:01
  • I don't think you looked at my demo, or at least didn't understand it. I've updated it in response to what you wrote. I'm not `exec`ing individual assignments, it's one big function. The variables are there normally in every way. – Alex Hall May 12 '18 at 16:31
  • @AlexHall Yes, that works fine. Although I tested a similar code like that on my end and it didn't work as expected, this one seems that works perfectly. But theoretically it doesn't make sense! – Mazdak May 12 '18 at 16:41
  • @AlexHall I updated the answer with your suggested code. Thanks for the comment. However, it's not 100% clear that how python distinguishes these TOS together. I couldn't find anything in source code either. – Mazdak May 12 '18 at 17:37