2

I've encountered a situation where subclassing unicode results in Deprecation Warnings on Python prior to 3.3 and errors on Python 3.3:

# prove that unicode.__init__ accepts parameters
s = unicode('foo')
s.__init__('foo')
unicode.__init__(s, 'foo')

class unicode2(unicode):
    def __init__(self, other):
        super(unicode2, self).__init__(other)

s = unicode2('foo')

class unicode3(unicode):
    def __init__(self, other):
        unicode.__init__(self, other)

s = unicode3('foo')

Curiously, the warnings/errors don't occur in the first three lines, but instead occur on lines 8 and 14. Here's the output on Python 2.7.

> python -Wd .\init.py
.\init.py:8: DeprecationWarning: object.__init__() takes no parameters
  super(unicode2, self).__init__(other)
.\init.py:14: DeprecationWarning: object.__init__() takes no parameters
  unicode.__init__(self, other)

The code is simplified to exemplify the issue. In a real-world application, I would perform more than simply calling the super __init__.

It appears from the first three lines that the unicode class implements __init__ and that method accepts at least a single parameter. However, if I want to call that method from a subclass, I appear to be unable to do so, whether I invoke super() or not.

Why is it okay to call unicode.__init__ on a unicode instance but not on a unicode subclass? What is an author to do if subclassing the unicode class?

Jason R. Coombs
  • 41,115
  • 10
  • 83
  • 93
  • 1
    I don't think you're *supposed* to subclass strings... Is there any particular reason why you can't just create a custom class using a Unicode string internally? You can easily make an object walk and talk like a Unicode string. – Hubro Feb 09 '13 at 01:01
  • I don't know why the problems happen, but subclassing unicode does seem really unusual. – Ned Batchelder Feb 09 '13 at 01:08
  • 1
    In a perfect world, one should be able to subclass any object. Subclassing strings in particular is useful when the subclass should act _as_ a string. Trying to emulate all of the interfaces of a string is much harder and error-prone than simply subclassing. For example, how would you implement a class to return true for `isinstance(my_subclass_instance, basestring)` without subclassing? See https://bitbucket.org/yougov/pmxbot/src/6415472739/pmxbot/core.py#cl-48 and https://github.com/jaraco/path.py/blob/ba38fc205e/path.py#L106 for useful examples. – Jason R. Coombs Feb 09 '13 at 01:12
  • I should also mention the same issue exists with `datetime.datetime`, so it's not unique to strings. – Jason R. Coombs Feb 09 '13 at 01:15
  • Not sure why you want to call `unicode.__init__` explicitly. I think using `collections.UserString` would be easier if you want replace the underlying string. – Kabie Feb 09 '13 at 02:57

1 Answers1

4

I suspect the issue comes from the fact that unicode is immutable.

After a unicode instance is created, it cannot be modified. So, any initialization logic is going to be in the __new__ method (which is called to do the instance creation), rather than __init__ (which is called only after the instance exists).

A subclass of an immutable type doesn't have the same strict requirements, so you can do things in unicode2.__init__ if you want, but calling unicode.__init__ is unnecessary (and probably won't do what you think it would do anyway).

A better solution is probably to do your customized logic in your own __new__ method:

class unicode2(unicode):
    def __new__(cls, value):
        # optionally do stuff to value here
        self = super(unicode2, cls).__new__(cls, value)
        # optionally do stuff to self here
        return self

You can make your class immutable too, if you want, by giving it a __setattr__ method that always raises an exception (you might also want to give the class a __slots__ property to save memory by omitting the per-instance __dict__).

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • 1
    Good answer, but yoeur example should replace `str2` with `unicode2`. Or just call directly as `unicode.__new__(cls, ...`. – Keith Feb 09 '13 at 01:36
  • Whoops, that's a typo that came from copying from a Python 3 implementation I tried (there is no `unicode` class there). I'll update. – Blckknght Feb 09 '13 at 02:38
  • Thanks for the answer, but I don't think it explains why I'm able to call `__init__` on a unicode instance but not a unicode-subclass instance. While unicode is immutable, it's still conceivable that something happens during `__init__`. Without knowledge of the underlying implementation, I ran the first three lines of code and confirmed that `unicode.__init__` does appear to exist and accept parameters, so I want to avoid _suppressing_ that method in a subclass. The question isn't asking how to subclass unicode, but why `unicode.__init__` doesn't work only in the subclass. – Jason R. Coombs Feb 09 '13 at 13:12
  • I'm pretty sure `unicode.__init__` is inherited from `object` (this is why the warning comes from `object.__init__`). The `DepreciationWarning` is because it has become an error in Python 3 to call `__init__` with any arguments (other than self). I'm not sure how Python decides to call a overridden `__init__` with the arguments passed to the constructor, but not `object.__init__`. – Blckknght Feb 09 '13 at 13:36
  • 1
    Ah, I found it. Via [the answer to this question](http://stackoverflow.com/questions/8611712/what-does-objects-init-do-in-python), I looked in [the source](http://hg.python.org/cpython/file/2.7/Objects/typeobject.c#l2814). A long comment at the top explains when `object`'s `__new__` and `__init__` will complain (or raise an exception) about excess arguments. – Blckknght Feb 09 '13 at 13:50
  • @Blckknght Great find. The comment and source does in fact clarify. Mutable types are handled differently from immutable types in how overridden `__init__` is handled. In particular, for the 'unicode' type, `tp_new != object_new` but `tp_init == object_new`, so nothing happens in object_init, but for unicode2 and unicode3, `tp_init != object_new`, so the warning/error occurs. Indeed, it seems "mutabality" is less a factor than the presence/absence of `__new__` and `__init__` methods. Thanks! – Jason R. Coombs Feb 09 '13 at 14:59