11

I have an sequence of characters, a string if you will, but I want to store metadata about the origin of the string. Additionally I want to provide a simplified constructor.

I've tried extending the str class in as many ways as Google would resolve for me. I gave up when I came to this;

class WcStr(str):
    """wc value and string flags"""

    FLAG_NIBBLES = 8 # Four Bytes

    def __init__(self, value, flags):
        super(WcStr, self).__init__()
        self.value = value
        self.flags = flags

    @classmethod
    def new_nibbles(cls, nibbles, flag_nibbles=None):
        if flag_nibbles is None:
            flag_nibbles = cls.FLAG_NIBBLES

        return cls(
            nibbles[flag_nibbles+1:],
            nibbles[:flag_nibbles]
        )

When I comment-out both parameters to @classmethod's cls() call it gives me this error:

TypeError: __init__() takes exactly 3 arguments (1 given)

Pretty typical, wrong number of args error,

With a two more arguments (eg as shown in the example code):

TypeError: str() takes at most 1 argument (2 given)

I've tried changing the __init__'s args, the super().__init__'s args, neither seem to make ant change.

With only one argument passed to cls(...) call, as the str class's error asks, I get this:

TypeError: __init__() takes exactly 3 arguments (2 given)

So I can't win here, whats gone wrong?


Ps this should be a second post but what property does str's raw string value get put into? I'd like to overload as little of the str class as I can to add this metadata into the constructor.

ThorSummoner
  • 16,657
  • 15
  • 135
  • 147
  • 1
    Python's raw string value doesn't get put into _any_ attribute. There's no "raw string"; its value just _is_ a string, and if that were in any attribute, it would have the same type as `str`. – abarnert May 05 '15 at 05:50
  • 1
    Meanwhile, you need to read about [`__new__`](https://docs.python.org/2/reference/datamodel.html#object.__new__). It's one of those things they don't teach you because most types only need an initializer, not a constructor… but now you're trying to subclass an immutable type, so you _do_ need a constructor. – abarnert May 05 '15 at 05:50
  • Finally, you're explicitly calling the super's `__init__` with no arguments. At best, that's going to give you an empty string. And since strings are immutable, it'll be empty forever. You probably don't want that, but I'm not sure what you _do_ want, from the rest of your code. (Are you sure you even wanted a `str` subclass at all, rather than just something that _owns_ a `str` and duck-styles as str-like by delegating a lot of methods to do?) – abarnert May 05 '15 at 05:52

2 Answers2

29

This is exactly what the __new__ method is for.

In Python, creating an object actually has two steps. In pseudocode:

value = the_class.__new__(the_class, *args, **kwargs)
if isinstance(value, the_class):
    value.__init__(*args, **kwargs)

The two steps are called construction and initialization. Most types don't need anything fancy in construction, so they can just use the default __new__ and define an __init__ method—which is why tutorials, etc. only mention __init__.

But str objects are immutable, so the initializer can't do the usual stuff of setting up attributes and so on, because you can't set attributes on an immutable object.

So, if you want to change what the str actually holds, you have to override its __new__ method, and call the super __new__ with your modified arguments.

In this case, you don't actually want to do that… but you do want to make sure str.__new__ doesn't see your extra arguments, so you still need to override it, just to hide those arguments from it.


Meanwhile, you ask:

what property does str's raw string value get put into?

It doesn't. What would be the point? Its value is a string, so you'd have a str which had an attribute which was the same str which had an attribute which etc. ad infinitum.

Under the covers, of course, it has to be storing something. But that's under the covers. In particular, in CPython, the str class is implemented in C, and it contains, among other things, a C char * array of the actual bytes used to represent the string. You can't access that directly.

But, as a subclass of str, if you want to know your value as a string, that's just self. That's the whole point of being a subclass, after all.


So:

class WcStr(str):
    """wc value and string flags"""

    FLAG_NIBBLES = 8 # Four Bytes

    def __new__(cls, value, *args, **kwargs):
        # explicitly only pass value to the str constructor
        return super(WcStr, cls).__new__(cls, value)

    def __init__(self, value, flags):
        # ... and don't even call the str initializer 
        self.flags = flags

Of course you don't really need __init__ here; you could do your initialization along with your construction in __new__. But if you don't intend for flags to be an immutable, only-set-during-construction kind of value, it makes more conceptual sense to do it the initializer, just like any normal class.


Meanwhile:

I'd like to overload as little of the str class as I can

That may not do what you want. For example, str.__add__ and str.__getitem__ are going to return a str, not an instance of your subclass. If that's good, then you're done. If not, you will have to overload all of those methods and change them to wrap up the return value with the appropriate metadata. (You can do this programmatically, either by generating wrappers at class definition time, or by using a __getattr__ method that generates wrappers on the fly.)


One last thing to consider: the str constructor doesn't take exactly one argument.

It can take 0:

str() == ''

And, while this isn't relevant in Python 2, in Python 3 it can take 2:

str(b'abc', 'utf-8') == 'abc'

Plus, even when it takes 1 argument, it obviously doesn't have to be a string:

str(123) == '123'

So… are you sure this is the interface you want? Maybe you'd be better off creating an object that owns a string (in self.value), and just using it explicitly. Or even using it implicitly, duck-typing as a str by just delegating most or all of the str methods to self.value?

lupodellasleppa
  • 124
  • 1
  • 1
  • 11
abarnert
  • 354,177
  • 51
  • 601
  • 671
0

Instead of __init__ try new:

def __new__(cls, value, flags):    
    obj = str.__new__(cls, value)
    obj.flags = flags
    return obj    
gus27
  • 2,616
  • 1
  • 21
  • 25