How do I refactor 100s of Class Methods in Python?

Question

I am working on some legacy code (created by someone in love with spaghetti code) that has over 150 getters and over 150 setters. The getters look like this:

def GetLoadFee(self):
    r_str = ""
    if len(self._LoadFee) > 20:
        r_str = self._LoadFee[:20]
    else:
        r_str = self._LoadFee.strip()
    return r_str.strip()

def GetCurrency(self):
    r_str = ""
    if len(self._Currency) > 3:
        r_str = self._Currency[:3]
    else:
        r_str = self._Currency.strip()
    return r_str.strip()

I would love to take the contents of each of these Getters and put them into a decorator /closure or some other method to make this code easier to maintain. The Setters are all one liners, so they're not as important. But they are basically all the same too. Any ideas to make this less painful?

NOTE: I still need the original Getter names as they are used in other programs as this nasty script is used in lots of other legacy code.

Do you actually want to edit the legacy code or do you want to patch the existing classes on the fly? — Hyperboreus, Jan 12 '13 at 02:18
Sadly yes. I had to add 17 new getters and setters to it and there's a possibility that we'll have to add more, so it's time to refactor it. — Mike Driscoll, Jan 14 '13 at 14:20

abarnert · Accepted Answer · 2013-05-30T18:43:04.297

def make_generic_getter(name, maxlen):
    def getter(self):
        value = getattr(self, name)
        r_str = ""
        if len(value) > maxlen:
            r_str = value[:maxlen]
        else:
            r_str = value.strip()
        return r_str.strip()
    return getter

Now, you can do this:

class Foo(object):
    def __init__(self):
        self._Bar = 'abc'
        self._Baz = 'def'
    GetBar = make_generic_getter('_Bar', 5)
    GetBaz = make_generic_getter('_Baz', 2)

Then:

>>> f = Foo()
>>> f.GetBar()
'abc'
>>> f.GetBaz()
'de'

Clearly, there's also a lot of repetitive and unnecessary stuff in the original function. (And it would be much better to use PEP8-style names for your properties.) But obviously it's much easier to refactor first, then improve, than the other way around. (In other words, start here, but don't stop here.)

From the comments:

How does the method maker get the "self" reference?

The method maker doesn't actually get the self reference. There is no self reference to get at the time the method maker is being called. But there also is no self reference to get for a normal method at the time the class is being defined. In either case, you're just defining a function that takes self as its first parameter, and it somehow magically gets the appropriate self when you call it.

To really understand how this actually works, you need to know about descriptors. See Implementing Descriptors and Invoking Descriptors (or the 3.3 version), read it over a few times, look at how the @property decorator is implemented, play around in the interactive interpreter, give up, go to sleep, and try again tomorrow, and it should all click. But it's easier if you learn the magic version first, so let's do that, using a simpler example:

>>> def func(self): pass
>>> class C(object):
...     def meth(self): pass
...     fake1 = func
>>> C.fake2 = func
>>> func, C.meth, C.fake1, C.fake2
(<function __main__.func>, <unbound method C.meth>, <unbound method C.func>, <unbound method C.func>)

An unbound method is just a thing with an im_class holding its class, an im_func holding a normal function, and an im_self holding None. And when you do fake1 = func in the class definition, or C.fake2 = func after the fact, you don't actually end up with func itself as the value of fake1 or fake2, but with an unbound method wrapped around func, its im_class pointing at C.

>>> c = C()
>>> c.meth, c.fake1
(<bound method C.meth of <__main__.C object at 0x111ebb0d0>>, <bound method C.meth of <__main__.C object at 0x111ebb0d0>>)

When you take an instance of a class, all of its unbound methods become bound methods. If you look at the bound methods' attributes, they're the same as the unbound methods, except that im_self is c instead of None. And when you call c.fake1(), that's how it works—Python sees that c.fake1 is a bound method, so, in effect, it calls c.fake1.im_func(c.fake1.im_self). And that's how fake gets its self parameter.

(This all becomes simpler in Python 3, because there's no such thing as unbound methods anymore, but I assume you care more about Python 2 given that you're dealing with a huge mess of legacy code.)

How does the method maker get the "self" reference? It works, but I don't understand that part of it. — Mike Driscoll, Jan 14 '13 at 14:30
@MikeDriscoll: It's hard to explain… but I'll try to edit the answer. — abarnert, Jan 14 '13 at 19:30
@abarnert, not trying to be hard (just want to make sure I understand), isn't output of f.GetBaz() should be 'de' (and not 'def' as written)? — Uri London, May 30 '13 at 18:36
@Uri: You're right, and it is. Let me edit the answer. Thanks for the catch. — abarnert, May 30 '13 at 18:42

Francis Avila · Answer 2 · 2013-01-16T03:23:58.397

3

You don't necessarily need to create getter/setter methods at class-creation time. You can also create callables as-needed:

class MyClass(object):
    # get/set properties for this class: {'Name':length}
    __properties = {'LoadFee':20, 'Currency':3}

    def __init__(self):
        self._Currency = '01 34'
        self._LoadFee = 'lorem ipsum dolor sit amet consecuti'

    def __getattr__(self, name):
        basename = name[3:]
        attrname = '_'+basename
        if basename in self.__properties:
            if name.startswith('Get'):
                return lambda : getattr(self, attrname)[:self.__properties[basename]].strip()
            elif name.startswith('Set'):
                return lambda value: setattr(self, attrname, value)
        raise AttributeError(name)

m = MyClass()

print m.GetCurrency()
print m.GetLoadFee()

Although this approach is easy to understand and doesn't use any metaprogramming voodoo, it's slow and defeats introspection.

You can speed this up by "reifying" methods as you call them, i.e., attaching an instancemethod to the class as attributes on instances of the class are accessed.

# MethodType is not actually necessary because
# everything it does can be done with normal Python
# but it will make our dynamic methods look as "normal"
# and not-dynamic as possible to introspection
from types import MethodType

class MyClass(object):
    # get/set properties for this class: {'Name':length}
    __properties = {'LoadFee':20, 'Currency':3}

    def __init__(self, **args):
        props = self.__properties
        emptystr = ''
        for k in props:
            setattr(self, '_'+k, args.get(k, emptystr))

    def __getattr__(self, name):
        print '__getattr__(%s)' % name
        # we can cache accesses by "reifying" our getters/setters as they are accessed
        cls = self.__class__
        basename = name[3:]
        attrname = '_'+basename
        # nested lambdas are just for delayed evaluation
        # they cache property size lookup--assumes __properties is class-constant!
        def getsize():
            return cls.__properties[basename]
        methodfactories = {
            'Get': lambda size: lambda self: getattr(self, attrname)[:size].strip(),
            'Set': lambda size: lambda self, value: setattr(self, attrname, value),
        }
        try:
            print '  creating', name
            callable = methodfactories[name[:3]](getsize())
        except (KeyError, AttributeError) as e:
            raise AttributeError("'{}' object has no attribute '{}'".format(cls.__name__, name))
        callable.__name__ = name #cosmetics
        unboundmethod = MethodType(callable, None, cls)
        setattr(cls, name, unboundmethod) # add unbound method to the class
        # magically get bound method on the instance!
        # this works because MethodType creates a descriptor that
        # returns a bound callable in an instance context
        # and an unbound one in a class context
        return getattr(self, name) # not an infinite loop!

If you then run the following code:

m = MyClass(Currency='01', LoadFee='lorem ipsum dolor sit')
n = MyClass(Currency='02', LoadFee='amet consecuti')
try:
    # AttributeError because it hasn't been used by an instance
    MyClass.GetCurrency 
except AttributeError, e:
    print ' 7:', e
print ' 8:', m.GetCurrency()
print ' 9:', MyClass.GetCurrency
print '10:', m.GetCurrency
print '11:', n.GetCurrency
print '12:', m.GetCurrency is n.GetCurrency
print '13:', n.GetCurrency()
print '14:', m.GetLoadFee()
print '15:', m.__dict__ # no per-instance callable!

You will get the following result:

 7: type object 'MyClass' has no attribute 'GetCurrency'
 8: __getattr__(GetCurrency)
  creating GetCurrency
01
 9: <unbound method MyClass.GetCurrency>
10: <bound method MyClass.GetCurrency of <__main__.MyClass object at 0x106f87b90>>
11: <bound method MyClass.GetCurrency of <__main__.MyClass object at 0x106f87f10>>
12: False
13: 02
14: __getattr__(GetLoadFee)
  creating GetLoadFee
lorem ipsum dolor si
15: {'_Currency': '01', '_LoadFee': 'lorem ipsum dolor sit'}

Notice that getattr is only called the first time any instance accesses a special property. After that a bound method is returned from the instancemethod we dynamically created and attached to the instance's class. After the first access of an attribute, the classes and instances will be nearly indistinguishable from a method we created the "normal" way and will have exactly the same runtime speed.

edited Jan 16 '13 at 03:23

answered Jan 11 '13 at 23:31

Francis Avila

31,233
6
58
96

+1: For almost totally dynamic, but likely slower than actually creating them once in the class definition (or via a decorator). – martineau Jan 12 '13 at 03:31
@martineau: It probably is slower, but I'd be willing to bet better than even money that it won't matter for the OP's code. – abarnert Jan 14 '13 at 20:13
@abarnert: Don't know if that's such a safe assumption to make about something that has "over 150 getters and over 150 setters" -- and, relatively speaking, based on the amount of code in the `__getattr__()` method, this approach has got to be much slower than regular class-creation-time properties. Dynamism is cool, but not always necessary or worth the overhead. – martineau Jan 14 '13 at 21:11
@martineau: I wouldn't give 1000:1 odds or anything, but better than even money? Definitely. If the cost of getting and setting were at all important, they wouldn't have implemented things this way in the first place, or at least they would have cached the truncated-and-stripped version, or at least they wouldn't `strip` each string twice in a row… – abarnert Jan 14 '13 at 21:34
@abarnert: Well, it just might matter in this case given that doing it this way is over 5 times slower that it was originally or implemented as in your answer (and even though it only does one `strip()` call). You can see for yourself with my [benchmarking code](http://pastebin.com/QDtayYdX). – martineau Jan 15 '13 at 01:24
@martineau: Now compare how long it takes to call my "fast" version vs. a version that caches the truncated and stripped value instead of recomputing it every time. And then compare that to accessing a data attribute instead of calling a method. They're already probably a couple orders of magnitude slower than they ought to be, which implies that it probably doesn't matter. Yes, another 5x could push that over the edge, but I'd say more likely than not, they wouldn't even notice. – abarnert Jan 15 '13 at 01:46
@abarnert: Caching the results of `__getattr__()` improved its performance by about 1.6x, which is still 3.4x slower than the faster versions. Even so, the results remain consistent with my original comment its dynamic nature also makes slower than other methods. I'm a little surprised by the fact that you seem to be arguing that even though it's slower, this answer is better than yours (the one accepted) because its speed likely doesn't matter. – martineau Jan 15 '13 at 18:20
@martineau: I'm not saying it's better than mine. (For example, even besides speed, mine is more readable, more introspectable, etc.) And I'm not suggesting they should speed this one up and use it anyway. I'm just suggesting that if they cared about speed at all, even the original (and my version) is way too slow—they would be caching the results of truncate and split, not calling split twice, etc. It's _possible_ that the OP has inherited code that was written by morons, but it's also possible that speed is irrelevant here. That's all I'm saying. – abarnert Jan 15 '13 at 19:46
@abarnert: OK, thanks for the clarification. I never said its speed was unacceptably slow which for "normal" classes is probably true. I don't think it's _that_ unreadable and consider it's being relatively hard to introspect a non-issue. I also suspect it might be easier to apply to a bunch of classes (or subclasses) than your approach. Totally agree it's weird that it doesn't cache somehow the results of any truncation and stripping, or at least just do it once in the setter rather than over and over in the getter. – martineau Jan 15 '13 at 22:53
@martineau, I thought this was easier to understand than class-modifying approaches and was mostly pointing out for the OP that metaprogramming is not the only way to approach this, which he seemed to assume. It also uses less runtime memory because there are no callables created until needed. But in every other way I consider this technique inferior (slower, no introspection, etc). If I were to add memoization, I would add it by dynamically adding unbound methods to the class and bound methods to the instances. – Francis Avila Jan 15 '13 at 23:02
Francis Avila: Wow, adding memoization via bound and unbound methods sounds like it could really make it unreadable, un-introspectable, etc -- but I'd like to see it if you're inclined. ;-) I only memoized the `__getattr__()` method when I was messing around trying to get even a little bit of the "orders of magnitude" speed increases @abamert seemed to think were possible. – martineau Jan 15 '13 at 23:15
@martineau, to be fair attributes that were *accessed* would be introspectable, but yes, a class or object whose inspect-able interface changes at runtime is generally Not A Good Thing for code clarity! – Francis Avila Jan 15 '13 at 23:19
1

@martineau I added a technique that creates instancemethods dynamically. It should run at exactly the same speed as a "normal" method after first access. – Francis Avila Jan 16 '13 at 03:07
FrancisAvila: Nice‼ Without factoring out the first access (since it's only 1 out of 100,000) in my updated speed test, it is _nearly_ as fast as the original and @abarnert's "getter" factory -- taking about 20% longer overall -- and is an impressive 4⅓x times faster than your initial answer. – martineau Jan 17 '13 at 03:37
@FrancisAvila: Even forgetting about speed, this _is_ actually a useful technique to show. In general, an object whose inspectable interface changes at runtime is bad—but think about cases like `ctypes`, `win32com`, `pyobjc`, or anything else where the interface is inherently dynamic and/or unpredictable. Knowing how to do this when it's appropriate is definitely useful information. – abarnert Jan 17 '13 at 19:05

score 1 · Answer 3 · edited Jan 12 '13 at 03:31

1

You could try something like this:

def getter(attr, length):
    def wrapper(self):
        value = getattr(self, attr)
        return value[:length].strip()
    return wrapper

GetCurrency = getter("_Currency", 3)

Because you can't overshoot the end of a string when slicing, the length test is no longer necessary.

edited Jan 12 '13 at 03:31

martineau

119,623
25
170
301

answered Jan 11 '13 at 22:53

BenTrofatter

2,148
1
13
13

score 0 · Answer 4 · edited May 23 '17 at 12:18

If there are literally hunderds of getters that share the same code; you could use metaclass to automate the getters creations:

def length_limiting_getter(name, maxlen):
    g = lambda self: getattr(self, "_"+name)[:maxlen].strip()
    g.__name__ = name
    return g

def add_attrs(attr_maxlens):
    def meta(class_name, base_classes, attrs):
        attrs.update((name, length_limiting_getter(name, maxlen))
                     for name, maxlen in attr_maxlens.items())
        return type(class_name, base_classes, attrs)
    return meta

Meta = add_attrs({n: maxlen for n, maxlen in zip("a b c".split(),
                                                 [1, 10, 50])})
class ClassWithManyGetters(object): # On Python 3 use: `(metaclass=Meta)` syntax
    __metaclass__ = Meta
    def __init__(self):
        for name in "abc":
            setattr(self, "_" + name, "X"*20)

c = ClassWithManyGetters()
print(c.a())
print(c.b())
print(c.c())

Output

X
XXXXXXXXXX
XXXXXXXXXXXXXXXXXXXX

The output shows that the length limiting feature works.

How do I refactor 100s of Class Methods in Python?

4 Answers4

Output