10

Python print isn't using __repr__, __unicode__ or __str__ for my unicode subclass when printing. Any clues as to what I am doing wrong?

Here is my code:

Using Python 2.5.2 (r252:60911, Oct 13 2009, 14:11:59)

>>> class MyUni(unicode):
...     def __repr__(self):
...         return "__repr__"
...     def __unicode__(self):
...         return unicode("__unicode__")
...     def __str__(self):
...         return str("__str__")
...      
>>> s = MyUni("HI")
>>> s
'__repr__'
>>> print s
'HI'

I'm not sure if this is an accurate approximation of the above, but just for comparison:

>>> class MyUni(object):
...     def __new__(cls, s):
...         return super(MyUni, cls).__new__(cls)
...     def __repr__(self):
...         return "__repr__"
...     def __unicode__(self):
...         return unicode("__unicode__")
...     def __str__(self):
...         return str("__str__")
...
>>> s = MyUni("HI")
>>> s
'__repr__'
>>> print s
'__str__'

[EDITED...] It sounds like the best way to get a string object that isinstance(instance, basestring) and offers control over unicode return values, and with a unicode repr is...

>>> class UserUnicode(str):
...     def __repr__(self):
...         return "u'%s'" % super(UserUnicode, self).__str__()
...     def __str__(self):
...         return super(UserUnicode, self).__str__()
...     def __unicode__(self):
...         return unicode(super(UserUnicode, self).__str__())
...
>>> s = UserUnicode("HI")
>>> s
u'HI'
>>> print s
'HI'
>>> len(s)
2

The _str_ and _repr_ above add nothing to this example but the idea is to show a pattern explicitly, to be extended as needed.

Just to prove that this pattern grants control:

>>> class UserUnicode(str):
...     def __repr__(self):
...         return "u'%s'" % "__repr__"
...     def __str__(self):
...         return "__str__"
...     def __unicode__(self):
...         return unicode("__unicode__")
... 
>>> s = UserUnicode("HI")
>>> s
u'__repr__'
>>> print s
'__str__'

Thoughts?

Alasdair
  • 298,606
  • 55
  • 578
  • 516
Rafe
  • 1,937
  • 22
  • 31
  • 1
    Is your code really indented like the first example? – GreenMatt Mar 28 '13 at 16:54
  • 1
    I had to guess as to what your question is. If I got it wrong, please do update your post to *include an actual, clear question*. – Martijn Pieters Mar 28 '13 at 16:54
  • Though this is a nice gotcha, I would like to ask why in h*** you would like to subclass str or unicode? I mean, the data will be immutable, so it the resulting object will be quite useless. – Kijewski Mar 28 '13 at 17:34
  • I added some more after [Edited...]. Feels gross but I don't think it breaks any Pythonic expectations. repr is a string representation that could be used to build a unicode object if needed, right? – Rafe Mar 28 '13 at 22:52
  • @Kay: Not useless at all. I've used it to create a name-convention object model for a 3D graphics software package. Basically making a name a special type of string that encapsulates utilities for working with the convention but can still be passed to the native API transparently. The 3D app is mostly unicode so I was trying to be consistent. However, in the case of this thread, I am wrapping an API object and I want the return value of my class to be dynamic, so it only mimics a true string - just has to pass isinstance(instance, basestring)...don't ask... – Rafe Mar 28 '13 at 22:59

2 Answers2

10

The problem is that print doesn't respect __str__ on unicode subclasses.

From PyFile_WriteObject, used by print:

int
PyFile_WriteObject(PyObject *v, PyObject *f, int flags)
{
...
        if ((flags & Py_PRINT_RAW) &&
    PyUnicode_Check(v) && enc != Py_None) {
    char *cenc = PyString_AS_STRING(enc);
    char *errors = fobj->f_errors == Py_None ? 
      "strict" : PyString_AS_STRING(fobj->f_errors);
    value = PyUnicode_AsEncodedString(v, cenc, errors);
    if (value == NULL)
        return -1;

PyUnicode_Check(v) returns true if v's type is unicode or a subclass. This code therefore writes unicode objects directly, without consulting __str__.

Note that subclassing str and overriding __str__ works as expected:

>>> class mystr(str):
...     def __str__(self): return "str"
...     def __repr__(self): return "repr"
... 
>>> print mystr()
str

as does calling str or unicode explicitly:

>>> class myuni(unicode):
...     def __str__(self): return "str"
...     def __repr__(self): return "repr"
...     def __unicode__(self): return "unicode"
... 
>>> print myuni()

>>> str(myuni())
'str'
>>> unicode(myuni())
u'unicode'

I believe this could be construed as a bug in Python as currently implemented.

nneonneo
  • 171,345
  • 36
  • 312
  • 383
6

You are subclassing unicode.

It'll never call __unicode__ because it already is unicode. What happens here instead is that the object is encoded to the stdout encoding:

>>> s.encode('utf8')
'HI'

except that it'll use direct C calls instead of the .encode() method. This is the default behaviour for print for unicode objects.

The print statement calls PyFile_WriteObject, which in turn calls PyUnicode_AsEncodedString when handling a unicode object. The latter then defers to an encoding function for the current encoding, and these use the Unicode C macros to access the data structures directly. You cannot intercept this from Python.

What you are looking for is an __encode__ hook, I guess. Since this is already a unicode subclass, print needs only to encode, not to convert it to unicode again, nor can it convert it to string without encoding it explicitly. You'd have to take this up with the Python core developers, to see if an __encode__ makes sense.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Since it doesn't do this for `str` subclasses, I think this is a Python bug. See my answer. – nneonneo Mar 28 '13 at 17:22
  • 1
    @nneonneo: not sure that I agree just yet. :-) – Martijn Pieters Mar 28 '13 at 17:22
  • 1
    Hm, why would it not be a bug? Treatment of `str` and `unicode` should be relatively uniform in Python 2.7. – nneonneo Mar 28 '13 at 17:23
  • 1
    @nneonneo: Turning `unicode` into a `str` for printing requires encoding. Turning anything else into a string requires calling `__str__`. Note the second example in the OP post; printing a custom object uses `__str__`, **not** `__unicode__`. – Martijn Pieters Mar 28 '13 at 17:25
  • Fine for `unicode` itself, but makes less sense for `unicode` subclasses. Feels like the code should call `PyUnicode_CheckExact`. – nneonneo Mar 28 '13 at 17:26
  • @nneonneo: If this is a real usecase (which I doubt), there should be some kind of `__encode__` hook instead. – Martijn Pieters Mar 28 '13 at 17:28
  • I feel it is a bug because I should be able to override it. If the base class is bypassing the expected behavior, it feels like something should be exposed to account for subclass overrides. – Rafe Mar 28 '13 at 22:11
  • 1
    @Rafe: but `__unicode__` is for converting something *to* unicode. `unicode(yourtype)` certainly will call it. But `print` is not converting, it is *encoding* instead. You may want to override the encoding behaviour, but there currently is no hook for that. You'd have to discuss that on the Python dev or ideas list instead, as a new feature. – Martijn Pieters Mar 28 '13 at 22:13
  • 1
    @Martijn: That makes sense, but isn't the issue that the unicode sub-class is ignoring __str__ when printing? That is where it feels like a bug to me. If it didn't ignore __str__ I'd be happy. Am I missing your point still? – Rafe Mar 28 '13 at 23:28
  • 1
    @Rafe: So what encoding should `__str__` use then? You have *no way* of telling that method what encoding to use... – Martijn Pieters Mar 28 '13 at 23:33