3

I find myself writing code like this relatively often:

_munge_text_re = re.compile("... complicated regex ...")
def munge_text(text):
    match = _munge_text_re.match(text)
    ... do stuff with match ...

Only munge_text uses _munge_text_re, so it would be better to make it local to the function somehow, but if I move the re.compile line inside the def then it will be evaluated every time the function is called, defeating the purpose of compiling the regular expression.

Is there a way to make _munge_text_re local to munge_text while still evaluating its initializer only once? The single evaluation needn't happen at module load time; on the first invocation of munge_text would be good enough.

The example uses a regex, and the majority of the time I need this it's for a regex, but it could be any piece of data that's expensive to instantiate (so you don't wanna do it every time the function is called) and fixed for the lifetime of the program. ConfigParser instances also come to mind.

Extra credit: For reasons too tedious to get into here, my current project requires extreme backward compatibility, so a solution that works in Python 2.0 would be better than one that doesn't.

zwol
  • 135,547
  • 38
  • 252
  • 361

4 Answers4

3

Now that it has state, just make a class for it:

class TextMunger(object):

    def __init__(self, regex):
        self._pattern = regex
        self._munge_text_re = None

    def __call__(self, text):
        if self._munge_text_re is None:
            self._munge_text_re = re.compile(self._pattern)

        match = self._munge_text_re.match(text)
        # ... do stuff with match ...

munge_text = TextMunger("... complicated regex ...")

# The rest of your code stays the same

In case you didn't know, the __call__ method on a class means that objects can be called as though they were functions, so you can continue to use munge_text(text) just as you did before.

(This kind of problem is actually what led to my question about a lazy property decorator in Python, which might also interest you; I wouldn't bother with that unless you find yourself repeating this pattern a lot though.)

Community
  • 1
  • 1
detly
  • 29,332
  • 18
  • 93
  • 152
  • +1, nice approach — but how do you do this so the docstring and such "look right" to introspection (in ipython/code completers/etc)? – Danica Jul 04 '13 at 06:59
  • @Dougal - good question. Usually I'm only interested in what Sphinx picks up, so it's enough to document the class as you would for any other class. You could try assigning to `self.__doc__`, but this won't fool eg. `help(munge_text)`. Maybe it **will** work for the other tools you use regardless. If not, maybe post it as another question. – detly Jul 04 '13 at 07:06
1
_munge_text_re = None
def munge_text(text):
    global _munge_text_re
    _munge_text_re = _munge_text_re or re.compile("... complicated regex ...")
    match = _munge_text_re.match(text)
    ... do stuff with match ...
Michael Lorton
  • 43,060
  • 26
  • 103
  • 144
  • That ... doesn't actually get rid of the global variable ... – zwol Jul 04 '13 at 02:29
  • No, but it gets rid of the expensive initialization – Michael Lorton Jul 04 '13 at 02:29
  • @Malvolio OP wanted to use locals though. – Rushy Panchal Jul 04 '13 at 02:31
  • The other solution is no better: instead of having a module-level variable that's only used to cache the RE, there's a module-level class that's only used to cache the RE. Python isn't the best language for flexible scoping. – Michael Lorton Jul 04 '13 at 02:35
  • I wondered if you could get rid of both globals and classes by using some kind of closure, but as far as I can see you'd still end up with a second global function. – detly Jul 04 '13 at 02:39
0

An alternative approach -- one I mention only for information, and not because I'd use it in production, so I'll community-wiki this -- would be to store the state in the function itself. You could use hasattr, or catch an AttributeError:

def munge1(x):
    if not hasattr(munge1, 'val'):
        print 'expensive calculation'
        munge1.val = 10
    return x + munge1.val

def munge2(x):
    try:
        munge2.val
    except AttributeError:
        print 'expensive calculation'
        munge2.val = 10
    return x + munge2.val

after which

>>> munge1(3)
expensive calculation
13
>>> munge1(3)
13
>>> munge2(4)
expensive calculation
14
>>> munge2(4)
14

but to be honest, I usually switch to a class at this point.

DSM
  • 342,061
  • 65
  • 592
  • 494
  • N.B. this does *not* work with Python 2.0. – zwol Jul 05 '13 at 22:54
  • @Zack: it works with recent Pythons, even in the 2.x line. Or do you mean 2.0 literally, i.e. the one which came out over a decade ago? – DSM Jul 05 '13 at 23:24
  • Yes, I literally mean 2.0. Backward compatibility all the way to 2.0 is a requirement for this particular project. You would probably be happier not knowing the rationale. – zwol Jul 05 '13 at 23:57
0

You could do the following, I suppose:

def munge_text(text):
    global munge_text
    _munge_text_re = re.compile("... complicated regex ...")
    def t(text):
       match = _munge_text_re.match(text)
       ... do stuff with match ...
    munge_text = t
    return t(text)
Michael Lorton
  • 43,060
  • 26
  • 103
  • 144