Pythonic way to Implement Data Types (Python 2.7)

Question

The majority of my programming experience has been with C++. Inspired by Bjarne Stroustrup's talk here, one of my favorite programming techniques is "type-rich" programming; the development of new robust data-types that will not only reduce the amount of code I have to write by wrapping functionality into the type (for example vector addition, instead of newVec.x = vec1.x + vec2.x; newVec.y = ... etc, we can just use newVec = vec1 + vec2) but will also reveal problems in your code at compile time through the strong type system.

A recent project I have undertaken in Python 2.7 requires integer values that have upper and lower bounds. My first instinct is to create a new data type (class) that will have all the same behavior as a normal number in python, but will always be within its (dynamic) boundary values.

class BoundInt:
    def __init__(self, target = 0, low = 0, high = 1):
        self.lowerLimit = low
        self.upperLimit = high
        self._value = target
        self._balance()

    def _balance(self):
        if (self._value > self.upperLimit):
            self._value = self.upperLimit
        elif (self._value < self.lowerLimit):
            self._value = self.lowerLimit
        self._value = int(round(self._value))

    def value(self):
        self._balance()
        return self._value

    def set(self, target):
        self._value = target
        self._balance()

    def __str__(self):
        return str(self._value)

This is a good start, but it requires accessing the meat of these BoundInt types like so

x = BoundInt()
y = 4
x.set(y)           #it would be nicer to do something like x = y
print y            #prints "4"
print x            #prints "1"
z = 2 + x.value()  #again, it would be nicer to do z = 2 + x
print z            #prints "3"

We can add a large number of python's "magic method" definitions to the class to add some more functionality:

def __add__(self, other):
    return self._value + other

def __sub__(self, other):
    return self._value - other

def __mul__(self, other):
    return self._value * other

def __div__(self, other):
    return self._value / other

def __pow__(self, power):
    return self._value**power

def __radd__(self, other):
    return self._value + other

#etc etc

Now the code is rapidly exploding in size, and there is a ton of repetition to what is being written, for very little return, this doesn't seem very pythonic at all.

Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects

x = BoundInt()
y = BoundInt(x)
z = BoundInt(4)

Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.

All of this feels terribly like trying to write c++ code in python, a cardinal sin if one of my favorite books, Code Complete 2, is taken seriously. I feel like I am swimming against the dynamic typing current, instead of letting it carry me forward.

I very much want to learn to code python 'pythonic-ally', what is the best way to approach this sort of problem domain? What are good resources to learn proper pythonic style?

I'm not sure the equivalent C++ code to achieve this would be much less repetitive. Templates could help with that, but there's a lot to be done with Python's metaprogramming capabilities - it seems you could safely generate the wrapped `__double_underscore__` methods automatically. — millimoose, Nov 06 '12 at 01:07
The C++ code _is_ repetitive—boost.operator does help quite a bit, but not enough. I think the OP was hoping that Python would be a lot less complicated. — abarnert, Nov 06 '12 at 01:24
Another problem with this implementation that you should be aware of is that the type of return value after operating on it is a plain numeric type, not your bounded type. So further operations on the results will not be what you expect. — Keith, Nov 06 '12 at 01:30
@abarnert has it right, I know and respect why the C++ code is repetitive, I thought (perhaps mistakenly) that python allowed you to be much more succinct. — Sam Coulter, Nov 06 '12 at 01:32
@SamCoulter An example of what I had in mind: http://ideone.com/BJExEL. I believe you *have* to insert new functions into `locals()` - Python looks up `__magic__` methods through the class of an object as an optimisation, bypassing attribute access that could be overriden using `__getattr__()`. This might make it impossible to analogously delegate magic attributes using properties. That said, I'm not sure if there's any magic attribute that should ever need overriding in this way. — millimoose, Nov 06 '12 at 01:37
@millimoose: Inserting into `locals` definitely isn't a "have to"; there are multiple ways to do this (and different with different Python versions), from having the instances use a custom getattr to do the lookup at call time, to building descriptors from scratch and assigning them in a metaclass… Meanwhile, to the OP: which version(s) of Python do you care about? Do you want to write lots of classes with identical forwarding, lots of classes that each do it a little different, or just one class? Is this a learning experience or for professional code? — abarnert, Nov 06 '12 at 02:11
@abarnert On 2.7.x, implementing `__getattr__()` (or `__getattribute__`) doesn't work when resolving magic methods implementing operators: http://ideone.com/r3dXA1. (See http://docs.python.org/2/reference/datamodel.html#special-method-lookup-for-new-style-classes) For this use case, a metaclass is just a more convoluted way of inserting into the same dict as `locals()` in class definition scope. The idea with inserting descriptors does hava an actual advantage in that the attribute access returns the actual underlying method instead of a wrapper with no meaningful metadata. — millimoose, Nov 06 '12 at 03:03
@millimoose: Ah, but in 2.7.x, the OP's class is a classic class, and that rule only applies to new-style classes. The larger point is that what you need to do differs between Python versions, so if the OP doesn't specifically say "Python 2.7", the only thing we can really do is write generic answers, with some signpoints to what to watch out for between versions. — abarnert, Nov 06 '12 at 19:48
This is in 2.7.x, I believe I tagged the question as 2.7, but I will update the OP as well. — Sam Coulter, Nov 06 '12 at 19:54
Is `BoundInt` intentionally written as a classic class rather than a new-style class? (PS, all of this gets a lot simpler if you can switch to Python 3.x, especially 3.3+. But then of course you have the new problem that most linux/Mac/etc. boxes don't come with python3 installations, some of the modules you want to use haven't been ported yet, half the example code you find on the net won't run without changes, … Fun tradeoff, eh?) — abarnert, Nov 07 '12 at 00:56
Unfortunately this will run on an embedded system, with a number of libraries depending on 2.7, so no luck on 3.3 :( — Sam Coulter, Nov 07 '12 at 01:56

abarnert · Accepted Answer · 2012-11-07T06:36:21.277

There's plenty of code in the standard library, in popular PyPI modules, and in ActiveState recipes that does this kind of thing, so you're probably better off reading examples than trying to figure it out from first principles. Also, note that this is pretty similar to creating a list-like or dict-like class, which there are even more examples of.

However, there are some answers to what you want to do. I'll start with the most serious, then work backward.

Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects … Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.

Ah, but think about what you're doing: You're constructing a BoundInt from anything that can act like an integer, including, say, an actual int or a BoundInt, right? So, why not:

def __init__(self, target, low, high):
    self.target, self.low, self.high = int(target), int(low), int(high)

I'm assuming you've already added a __int__ method to BoundInt, of course (the equivalent of a C++ explicit operator int() const).

Also, keep in mind that the lack of overloading isn't as serious as you're thinking coming from C++, because there is no "copy constructor" for making copies; you just pass the object around, and all that gets taken care of under the covers.

For example, imagine this C++ code:

BoundInt foo(BoundInt param) { BoundInt local = param; return local; }
BoundInt bar;
BoundInt baz = foo(bar);

This copies bar to param, param to local, local to an unnamed "return value" variable, and that to baz. Some of these will be optimized out, and others (in C++11) will use move instead of copy, but still, you've got 4 conceptual invocations of the copy/move constructors/assignment operators.

Now look at the Python equivalent:

def foo(param): local = param; return local
bar = BoundInt();
baz = foo(bar)

Here, we've just got one BoundInt instance—the one that was explicitly created—and all we're doing is binding new names to it. Even assigning baz as a member of a new object that outlives the scope of bar and baz won't make a copy. The only thing that makes a copy is explicitly calling BoundInt(baz) again. (This isn't quite 100% true, because someone can always inspect your object and attempt to clone it from the outside, and pickle, deepcopy, etc. may actually do so… but in that case, they're still not calling a "copy constructor" that you or the compiler wrote.)

Now, what about forwarding all those operators to the value?

Well, one possibility is to do it dynamically. The details depend on whether you're in Python 3 or 2 (and, for 2, how far back you need to support). But the idea is you just have a list of names, and for each one, you define a method with that name that calls the method of the same name on the value object. If you want a sketch of this, provide the extra info and ask, but you're probably better off looking for examples of dynamic method creation.

So, is that Pythonic? Well, it depends.

If you're creating dozens of "integer-like" classes, then yes, it's certainly better than copy-paste code or adding a "compile-time" generation step, and it's probably better than adding an otherwise-unnecessary base class.

And if you're trying to work across many versions of Python and don't want to have to remember "which version am I supposed to stop supplying __cmp__ to act like int again?" type questions, I might go even further and get the list of methods out of int itself (take dir(int()) and blacklist out a few names).

But if you're just doing this one class, in, say, just Python 2.6-2.7 or just 3.3+, I think it's a toss-up.

A good class to read is the fractions.Fraction class in the standard library. It's clearly-written pure Python code. And it partially demonstrates both the dynamic and explicit mechanisms (because it explicitly defines each special message in terms of generic dynamic forwarding functions), and if you've got both 2.x and 3.x around you can compare and contrast the two.

Meanwhile, it seems like your class is underspecified. If x is a BoundInt and y is an int, should x+y really return an int (as it does in your code)? If not, do you need to bound it? What about y+x? What should x+=y do? And so on.

Finally, in Python, it's often worth making "value classes" like this immutable, even if the intuitive C++ equivalent would be mutable. For example, consider this:

>>> i = BoundInt(3, 0, 10)
>>> j = i
>>> i.set(5)
>>> j
5

I don't think you'd expect this. This wouldn't happen in C++ (for a typical value class), because j = i would create a new copy, but in Python, it's just binding a new name to the same copy. (It's equivalent to BoundInt &j = i, not BoundInt j = i.)

If you want BoundInt to be immutable, besides eliminating obvious things like set, also make sure not to implement __iadd__ and friends. If you leave out __iadd__, i += 2 will be turned into i = i.__add__(2): in other words, it will create a new instance, then rebind i to that new instance, leaving the old one alone.

This answer is full of useful information, I especially like the way you have the constructor implemented. Can you elaborate on what you mean by "passing the object around gets taken care of under the covers" -Thanks — Sam Coulter, Nov 06 '12 at 01:39
A further edit explaining the other consequence of passing bindings around instead of copies: you have to think about mutability a bit more carefully. — abarnert, Nov 06 '12 at 19:04

score 2 · Answer 2 · answered Nov 06 '12 at 01:10

2

There are likely many opinions on this. But regarding the proliferation of of special methods, you will just have to do that to make it complete. But at least you only do that once, in one place. Also the built-in number types can be subclassed. That's what I did for a similar implementation, that you can look it.

answered Nov 06 '12 at 01:10

Keith

42,110
11
57
76

I like the suggestion of subclassing the built-in. I think that ensures a lot of the behavior once you get the fine-tuning for the limits in place. – g.d.d.c Nov 06 '12 at 01:21
1

@g.d.d.c I was under the impression that subclassing a built-in is, in fact, a great way to unexpectedly get "default" behaviour unless you override *everything*. (Or only what you're meant to override, as per the documentation.) Internal implementations of builtins may take shortcuts that don't necessarily take your overrides into considerations. – millimoose Nov 06 '12 at 01:40
@millimoose: That's partly an obsolete impression; they've been gradually fixing the types to follow their documentation, and by 2.6 it's mostly done (which is why they finally dropped the special "inheritable" `dict` and `list` classes). But you're still right that it's easier to debug with composition (where missing a method means `AttributeError`) than inheritance (where it often means silently doing the wrong thing, or, worse, silently doing the right thing 99.99% of the time but not 100%…). – abarnert Nov 06 '12 at 02:04
@millimoose: Also, there's an easy way to catch that: Just write an `assert` that there are no callable members of `long` that are aren't overridden, except for a short whitelist, and then write a unit test. that calls each defined method and each whitelisted method to make sure they all work as expected. (Is `unsigned.bit_length` correct without overriding it? Probably, but… I wouldn't trust my instincts without testing it.) – abarnert Nov 06 '12 at 02:19
@abarnert Do you have an example of what those changes made possible at hand? I tried what, to me, intuitively seemed like a trivial example (http://ideone.com/QxRwx8), and it obviously doesn't do anything useful. My point wasn't that there's some technical impediment that would prevent subclassing a builtin, or cause default behaviour to be called instead of your override. What I meant is that built-in are not explicitly designed to allow for the sort of customisation my example uses. (Where subclassing a built-in would give you some behaviour "for free".) – millimoose Nov 06 '12 at 02:26
@abarnert To put it another way, I'm certain there's valid use cases for subclassing built-ins. Ensuring behaviour consistent – in an intuitive way – with the superclass (without explicitly overriding most of its methods) doesn't seem to be one of them. (As opposed to, say, subclassing the ABCs in `collections`, where this is an explicit feature.) – millimoose Nov 06 '12 at 02:31
@millimoose Another advantage of subclassing is that you can use the special class where a more generic type is checked for elsewhere. That is, `isinstance(unsigned(42), long)` is True. whereas for a composed class it would not be. – Keith Nov 06 '12 at 02:32
@Keith Good catch, although type checking is one of the less "pythonic" things to do, so that's a weakish argument to be made against supporting that. Duck-typing or type coercion should probably be tried first. (For scalars. For collections, there's the ABCs.) Nonetheless, as I said, I don't think you should avoid subclassing built-ins; just that you need to be more careful about whether you implement the contract of the superclass correctly, not less so. – millimoose Nov 06 '12 at 02:43
@millimoose Agreed, however type checking is sometimes done, but even with duck typing it is expected that a numeric operation yield a new value that is at least the same type as the operator. Anyway, it works for me. ;-) – Keith Nov 06 '12 at 02:49
@Keith Actually numeric operations on Python numbers often give you a different type, in order to preserve mathematical expectations. `sys.maxint + sys.maxint` gives you a long integer, so that the math works. `2 * 2.3` gives you a float, so that the math works. In Python2 `1 / 3` gives you an `int`, and the math doesn't work, an this causes a lot of bugs and was regarded as a mistake, which is why there is `from __future__ import division` and why integer division gives you a float in Python 3. – Ben Nov 06 '12 at 03:00
@Ben Yes, that's what I meant by "at least the same", meaning you don't lose anything, but you may be promoted to a type that can contain the number. It is a bug to demote a number to one with a smaller range. However, that behavior you describe can be a problem in some contexts. Hence the existence of this unsigned, range-limited class. – Keith Nov 06 '12 at 03:08
@Ben And, to make things worse, Python doesn't implement a numerical tower or any sort of marker type (analogous to `basestring`) to designate numbers. So explicitly type checking against (specifically) numeric types isn't very helpful, unless you check against *all* of them. (Or at least a few of them every time.) – millimoose Nov 06 '12 at 03:10

score 1 · Answer 3 · answered Nov 06 '12 at 02:54

Your set method is an abomination. You do not create a number with a default value of zero and then change the number into some other number. That is very much trying to program C++ in Python, and will cause you endless amounts of headaches if you actually want to treat these the same way you do numbers, because every time you pass them to functions they are passed by reference (like everything in Python). So you'll end up with large amounts of aliasing in things you think you can treat like numbers, and you will almost certainly encounter bugs due to mutating the value of numbers you don't realise are aliased, or expecting to be able to retrieve a value stored in a dictionary with a BoundInt as a key by providing another BoundInt with the same value.

To me, high and low aren't data values associated with a particular BoundInt value, they're type parameters. I want a number 7 in the type BoundInt(1, 10), not a number 7 which is constrained to be between 1 and 10, all of which being a value in the type BoundInt.

If I really wanted to do something like this, the approach I would take would be to subclass intis to treat BoundInt as a class factory; you give it a range, and it gives you the type of integers restricted to be in that range. You can apply that type to any "int-like" object and it will give you a value clamped to that range. Something like:

_bound_int_cache = {}
def BoundInt(low, low):
    try:
        return _bound_int_cache[(low, high)]
    except KeyError:
        class Tmp(int):
            low = low
            high = high
            def __new__(cls, value):
                value = max(value, cls.low)
                value = min(value, cls.max)
                return int.__new__(cls, value)

        Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
        _bound_int_cache[(low, high)] = Tmp
        return _bound_int_cache[(low, high)]

(The cache is just to ensure that two different attempts to get the BoundInt type for the same low/high values give you the exact same class, not two different classes that behave the same way. Probably wouldn't matter in practice most of the time, but it seems nicer.)

You would use this like:

B = BoundInt(1, 10)
x = B(7)

The "class factory" approach means that if you have a small number of meaningful ranges in which you want to bound your integers, you can create the classes for those ranges globally (with meaningful names), and then use them exactly like regular classes.

Subclassing int makes these objects immutable (which is why the initialisation had to be done in __new__), which frees you from aliasing bugs (which people don't expect to have to worry about when they're programming with simple value types like numbers, and for good reason). It also gives you all the integer methods for free, and so these BoundInt types behave exactly as int, except that when you create one the value is clamped by the type. Unfortunately that means that all operations on these types return int objects, not BoundInt objects.

If you could come up with a way of reconciling the low/high values for the two different values involved in e.g. x + y, then you can override the special methods to make them return BoundInt values. The approaches that spring to mind are:

Take the left operand's bounds and ignore the right operand (seems messy and asymmetrical; violates the assumption that x + y = y + x)
Take the maximum low value and the minimum high value. It's nicely symmetrical, and you can treat numeric values that don't have low and high values as if they were sys.minint and sys.maxint (i.e. just use the bounds from the other value). Doesn't make a whole lot of sense if the ranges don't overlap at all, because you'll end up with an empty range, but operating on such numbers together probably doesn't make a whole lot of sense anyway.
Take the minimum low value and the maximum high value. Also symmetrical, but here you probably want to explicitly ignore normal numbers rather than pretending they're BoundInt values that can range over the whole integer range.

Any of the above could work, and any of the above will probably surprise you at some point (e.g. negating a number constrained to be in a positive range will always give you the smallest positive number in the range, which seems weird to me).

If you take this approach, you probably don't want to subclass int. Because if you have normalInt + boundedInt, then normalInt would handle the addition without respecting your code. You instead want it not to recognose boundedInt as an int value, so that int's __add__ wont' work and will give your class a chance to try __radd__. But I would still treat your class as "immutable", and make every operation that comes up with a new number construct a new object; mutating numbers in place is practically guaranteed to cause bugs sometime.

So I'd handle that approach something like this:

class BoundIntBase(object):
    # Don't use this class directly; use a subclass that specifies low and high as
    # class attributes.
    def __init__(self, value):
        self.value = min(self.high, max(self.low, int(value)))

    def __int__(self):
        return self.value

# add binary operations to BoundInt
for method in ['__add__', '__radd__', ...]:
    def tmp(self, other):
        try:
            low = min(self.low, other.low)
            high = max(self.high, other.high)
        except AttributError:
            cls = type(self)
        else:
            cls = BountInd(low, high)
        v = getattr(int(self), method)(int(other))
        return cls(v)
    tmp.__name__ = method
    setattr(BountIntBase, method, tmp)


_bound_int_cache = {}
def BoundInt(low, low):
    try:
        return _bound_int_cache[(low, high)]
    except KeyError:
        class Tmp(int):
            low = low
            high = high
            def __new__(cls, value):
                value = max(value, cls.low)
                value = min(value, cls.max)
                return int.__new__(cls, value)

        Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
        _bound_int_cache[(low, high)] = Tmp
        return _bound_int_cache[(low, high)]

Still seems like more code than it should be, but what you're trying to do is actually more complicated than you think it is.

Your analysis of the edge cases does make it seem that C++ operator overloading (a pox upon its name) would in fact help make sure that all the edge cases you mention are dealt with. (Since you're forced to implement all possible combinations of operands by hand.) Whereas Python makes it easier to generate the implementations if you know what you're doing. (Then again this shouldn't be surprising as it's just a special case of the greater static-vs-dynamic distinction.) — millimoose, Nov 06 '12 at 03:25

score 0 · Answer 4 · edited May 23 '17 at 10:29

The type that behaves exactly like numbers in all situations needs many special methods due to rich syntax support in Python (it seems no other types require so much methods e.g., it is much simpler to define types that behave like a list, dict in Python: a couple of methods and you have a Sequence). There are several ways to make the code less repetitive.

ABC classes such as numbers.Integral provide default implementations for some methods e.g., if __add__, __radd__ are implemented in a subclass then __sub__, __rsub__ are available automatically.

fractions.Fraction uses _operator_fallbacks to define __r*__ and provide fallback operators to deal with other numeric types:

__op__, __rop__ = _operator_fallbacks(monomorphic_operator, operator.op)

Python allows to generate/modify a class dynamically in a factory function/metaclass e.g., Can anyone help condense this Python code?. Even exec could be used in (very) rare cases e.g., namedtuple().

Numbers are immutable in Python so you should use __new__ instead of __init__.

Rare cases that are not covered by __new__ could be defined in from_sometype(cls, d: sometype) -> your_type class methods. And in reverse, cases that are not covered by special methods could use as_sometype(self) -> sometype methods.

A simpler solution in your case might be to define a higher-level type specific for your application domain. Number abstraction might be too low-level e.g., decimal.Decimal is more than 6 KLOC.

Pythonic way to Implement Data Types (Python 2.7)

4 Answers4