1

So I have some python code which goes like this

import mymodule

sum_global= mymodule.populateGlobalList("bigdict.txt")

and so on... with the code in mymodule including the method

def populateGlobalList(thefile):
    #do the stuff

So far so ok.

But elsewhere in mymodule I have a method that says

def usefulFunction(arg1, arg2):
   #lotsofstuff
   if arg1 not in sum_global:
      #add to list

So interpreter trips on undefined sum_global, which makes sense. Now usefulFunction could just take in sum_global as an argument, at least in theory. But sum_global is meant to be an English dictionary that is to be used extensively to check if words encountered are English words (or at least correctly spelled). As this occurs a lot, it would just feel unnecessarily awkward make it local.

On the other hand just declaring a global sum_global in the module (essentially to fool the interpreter), with the intention that this empty vessel is filled in in the program importing mymodule, feels completely wrong.

What is a sound design for this situation?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Stumbler
  • 2,056
  • 7
  • 35
  • 61
  • Globals are *per module*. Put `sum_global` in `mymodule` or pass it in as an argument to the function. – Martijn Pieters Apr 29 '16 at 11:30
  • I'm not sure if it makes sense exactly, but you could make it a class and have `sum_global` be a class attribute, then in `usefulFunction` just refer to `self.sum_global`. – gplayer Apr 29 '16 at 11:37
  • Not sure if this is what you were looking for, but if you supply an object as a default in the `kwargs` list of a function def, the object is created once and then always supplied as the default, e.g. `usefulFunction(arg1, arg2, _cached_list={})`. – dan-man Apr 29 '16 at 11:37
  • @dan-man: the problem is then that `populateGlobalList()` takes an argument. So the value isn't all that global, as there are may well different results for different files. What if there is another module that call `populateGlobalList('anotherbigdict.txt')`? – Martijn Pieters Apr 29 '16 at 11:44
  • @MartijnPieters - without knowing more details it's hard to say whether my suggestion is suitable. In principle you might want to remove the `populateGlobalList` entirely, and instead pass the filename to the `usefulFunction`, and then store everything in the cache object, indexed by both filename and `arg1`. i.e. so that you are lazily "populating" the list. – dan-man Apr 29 '16 at 11:48
  • @dan-man: yup, that's perhaps an option too. But all this sounds more like there should be a class instead, coupling state and behaviour. – Martijn Pieters Apr 29 '16 at 11:49
  • @MartijnPieters - I don't disagree (and I didn't downvote your answer), but I think there are times when the `kwarg` approach is better for being concise. It can also be helpful when you have a true "cache" of things that you want to manage properly (although static members in a class would work too of course). – dan-man Apr 29 '16 at 11:54
  • @dan-man: the `re` module follows this approach (as well as give you the OO option via `re.compile()`), and the hidden cache has bitten people in the past. See [Python re module becomes 20 times slower when looping on more than 100 different regex](https://stackoverflow.com/q/17325281) and [Why are uncompiled, repeatedly used regexes so much slower in Python 3?](https://stackoverflow.com/q/14756790). By making the cache 'visible' in the form of instances, you move responsibility to the API user. Explicit is better than implicit! – Martijn Pieters Apr 29 '16 at 11:56

1 Answers1

1

Each module has its own global namespace. You added sum_global to the wrong module, it doesn't live in mymodule.

Either put sum_global in mymodule or pass it in as an argument to the function that needs it.

It sounds as if you want to defer calculating the sum_global value until you have a filename; you could have populateGlobalList() set the global here; this is all in mymodule:

sum_global = None

def populateGlobalList(filename):
    global sum_global
    if sum_global is None:
        sum_global = "result of your population work"

This function doesn't return the data, it sets the global variable in the module (provided it hasn't already been set).

However, you should try to avoid creating globals like these. Since the users of your module must provide a filename, it is much better if the user code tracks the result of the populateGlobalList() call and explicitly pass this into the usefulFunction()` call:

import mymodule

sum_global = mymodule.populateGlobalList("bigdict.txt")

result = mymodule.usefulFunction(arg1, arg2, sum_global)

and retool usefulFunction() to require such an argument.

A next step would be for your module to use a class for this:

class MyClass(object):
    def __init__(self, filename):
        self._populate()

    def _populate(self):
        self.sum = "result of your population work"

    def useful_function(self, arg1, arg2):
        # do work with self.sum

then use that class everywhere:

 interesting_object = mymodule.MyClass('bigdict.txt')
 result = interesting_object.useful_function(arg1, arg2)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343