4

Backstory: I was trying implementing one way to handle -v parameters to increase the verbosity of an application. To that end, I wanted to use a global variable that is pointing to an empty lambda function initially. If -v is given the variable is changed and gets another lambda function assigned that does print it input.

MWE: I noticed that this did not work as expected when calling the lambda function from another module after importing it via from x import *...

mwe.py:

from mod import *
import mod

def f():
  vprint("test in f")

vprint("test before")
print("before: %d" % foo)
set_verbosity(1)
vprint("test after")
print("after: %d" % foo)
f()
mod.vprint("explicit: %d" % mod.foo)
modf()

mod.py:

vprint = lambda *a, **k: None
foo = 42

def set_verbosity(verbose):
  global vprint, foo
  if verbose > 0:
    vprint = lambda *args, **kwargs: print(*args, **kwargs)
    foo = 0

def modf():
  vprint("modf: %d" % foo)

The output is

before: 42
after: 42
explicit: 0
modf: 0

where the "explicit" and "modf" outputs are due to the mod.vprint and modf calls at the end of the mwe. All other invocations of vprint (that go through the imported version of vprint) are apparently not using the updated definition. Likewise, the value of foo seems to be imported only once.

Question: It looks to me as if the from x import * type of imports copies the state of the globals of the imported module. I am not so much interested in workarounds per se but the actual reason for this behavior. Where is this defined in the documentation and what's the rationale?

Workaround: As a side note, one way to implement this anyway is to wrap the global lambda variables by functions and export only them:

_vprint = lambda *a, **k: None

def vprint(*args, **kwargs):
  _vprint(*args, **kwargs)

def set_verbosity(verbose):
  global _vprint
  if verbose > 0:
    _vprint = lambda *args, **kwargs: print(*args, **kwargs)

This makes it work with the import-from way that allows the other module to simply call vprint instead of explicitly deferencing via the module's name.

stefanct
  • 2,503
  • 1
  • 28
  • 32
  • Possible duplicate of [Global Variable from a different file Python](https://stackoverflow.com/questions/3400525/global-variable-from-a-different-file-python) – Adam.Er8 Jul 31 '19 at 15:22
  • 1
    No, that deals with the same issue but does not explain **why** it works like that nor where it is documented. (Neither does the often referenced FAQ entry on the topic: https://docs.python.org/2/faq/programming.html#how-do-i-share-global-variables-across-modules). – stefanct Jul 31 '19 at 15:28
  • the python tag has a fair share of experts answering regularly, I hope someone will supply some insight – Adam.Er8 Jul 31 '19 at 15:34

2 Answers2

2

TL;DR: when you do from module import *, you're copying the names and their associated references; changing the reference associated with the original name does not change the reference associated with the copy.


This deals with the underlying difference between names and references. The underlying reason for the behavior has to do with how python handles such things.

Only one thing is truly immutable in python: memory. You can't change individual bytes directly. However, almost everything in python deals with references to individual bytes, and you can change those references. When you do my_list[2] = 5, you're not changing any memory - rather, you're allocating a new block of memory to hold the value 5, and pointing the second index of my_list to it. The original data that my_list[2] used to be pointing at is still there, but since nothing refers to it any more, the garbage collector will take care of it eventually and free the memory it was using.

The same principle goes with names. Any given namespace in python is comparable to a dict - each name has a corresponding reference. And this is where the problems come in.

Consider the difference between the following two statements:

from module import *
import module

In both cases, module is loaded into memory.

In the latter case, only one thing is added to the local namespace - the name 'module', which references the entire memory block containing the module that was just loaded. Or, well, it references the memory block of that module's own namespace, which itself has references to all the names in the module, and so on all the way down.

In the former case, however, every name in module's namespace is copied into the local namespace. The same block of memory still exists, but instead of having one reference to all of it, you now have many references to small parts of it.


Now, let's say we do both of those statements in succession:

from module import *
import module

This leaves us with one name 'module' referencing all the memory the module was loaded into, and a bunch of other names that reference individual parts of that block. We can verify that they point to the same thing:

print(module.func_name == func_name)
# True

But now, we try to assign something else to module.attribute:

module.func_name = lambda x:pass
print(module.func_name == func_name)
# False

It's no longer the same. Why?

Well, when we did module.func_name = lambda x:pass, we first allocated some memory to store lambda x:pass, and then we changed module's 'func_name' name to reference that memory instead of what it was referencing. Note that, like the example I gave earlier with lists, we didn't change the thing that the module.func_name was previously referencing - it still exists, and the local func_name continues to reference it.

So when you do from module import *, you're copying the names and their associated references; changing the reference associated with the original name does not change the reference associated with the copy.


The workaround for this is to not do import *. In fact, this is pretty much the ultimate reason why using import * is usually considered poor practice, save for a handful of special cases. Consider the following code:

# module.py
variable = "Original"
# file1.py
import module

def func1():
    module.variable = "New"
# file2.py
import module
import file1

print(module.variable)
file1.func1()
print(module.variable)

When you run python file2.py, you get the following output:

Original
New

Why? Because file1 and file2 both imported module, and in both of their namespaces 'module' is pointing to the same block of memory. module's namespace contains a name 'variable' referencing some value. Then, the following things happen:

  1. file2 says "okay, module, please give me the value associated with the name 'variable' in your namespace.
  2. file1.func1() says "okay, module, the name 'variable' in your namespace now references this other value.
  3. file2 says "okay, module, please give me the value associated with the name 'variable' in your namespace.

Since file1 and file2 are still both talking to the same bit of memory, they stay coordinated.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • Upvote for that list example! Learn something everyday, I had no idea each list cell was a separate reference. – kabanus Jul 31 '19 at 17:18
  • 1
    @kabanus It's also something you can realize naturally, when you consider "How can objects of arbitrary size (like strings) be stored in array-style lists (which would be necessary for performance reasons)"? If you conceptualize it in C terms, the answer is "you'd use pointers, obviously" - like, an array of strings is actually just an array of `char*`, with each string allocated independently. But for that behavior to be consistent for *all* objects, then clearly *every* object is referred to by its reference when you put it into a list. And things follow logically from there. – Green Cloak Guy Jul 31 '19 at 17:21
  • That's only true if it *has* to be consistently implemented for all objects. But it does not. Perl for example does not store pointers to integers in lists. – stefanct Aug 01 '19 at 15:57
1

Random stab in the dark ahead:

If you look at the docs for __import__, there's the bit:

On the other hand, the statement from spam.ham import eggs, sausage as saus results in

_temp = __import__('spam.ham', globals(), locals(), ['eggs', 'sausage'], 0)
eggs = _temp.eggs
saus = _temp.sausage

I think this is the key. If we infer that from mod import * results in something like:

_temp = __import__('mod', globals(), locals(), [], 0)
printv = _temp.printv
foo = _temp.foo

This shows what the problem is. printv is a reference to the old version of printv; what mod.printv was pointing to at the time of import. Reassigning what the printv in mod is pointing to doesn't effect anything in mwe, because the mwe reference to printv is still looking at the previous lambda.

It's similar to how this doesn't change b:

a = 1
b = a
a = 2

b is still pointing to 1, because reassigning a doesn't effect what b is looking at.

On the other hand, mod.printv does work because we are now using a direct reference to the global in mod instead of a reference that points to printv in mod.


This was a random stab because I think I know the answer based on some random reading I did awhile ago. If I'm incorrect, please let me know and I'll remove this to avoid confusion.

Carcigenicate
  • 43,494
  • 9
  • 68
  • 117
  • Thanks, very insightful and well referenced (pun intended). I'd still like to know why it is implemented that way (because it seems rather insane from the user's PoV and the myriad of questions on the internet on the topic proofs that all too well - maybe including the one above ;) After all, the behavior seems to be very exotic and I don't know other languages that do it that way (not even Perl is so peculiar(!)). – stefanct Jul 31 '19 at 16:42
  • 1
    @stefanct I'm not sure we can answer the justification. It may be oversight on BDFL's part, or a poorly-documented-but-intentional result of how `import`s work. The idea may have been that relying on globals is bad, and relying to mutating globals across files is even worse, so it shouldn't be a problem in any good code. Idk. – Carcigenicate Jul 31 '19 at 16:46
  • Since Python has functions as first-class citizens I disagree because as shown in this very example global variables and (exported) functions **would** be interchangeable if the export/import would be implemented differently. NB: I don't modify the global variable across files but only within the module that defines it (exactly for the reasons global variables are usually deemed bad: breaking encapsulation). And that a function changes functionality because some other function within the same module/class is called before that, is perfectly normal behavior and the reason getters/setters exists – stefanct Jul 31 '19 at 17:02