Python str versus unicode

Question

Is there a python convention for when you should implement __str__() versus __unicode__(). I've seen classes override __unicode__() more frequently than __str__() but it doesn't appear to be consistent. Are there specific rules when it is better to implement one versus the other? Is it necessary/good practice to implement both?

score 271 · Accepted Answer · answered Aug 20 '09 at 16:04

271

__str__() is the old method -- it returns bytes. __unicode__() is the new, preferred method -- it returns characters. The names are a bit confusing, but in 2.x we're stuck with them for compatibility reasons. Generally, you should put all your string formatting in __unicode__(), and create a stub __str__() method:

def __str__(self):
    return unicode(self).encode('utf-8')

In 3.0, str contains characters, so the same methods are named __bytes__() and __str__(). These behave as expected.

answered Aug 20 '09 at 16:04

John Millikin

197,344
39
212
226

2

sa you mean creating both __unicode__ and __str__ methods or just keep strings in _(u"") and create __string__ (without the unicode method) ? – muntu Sep 03 '10 at 12:59
14

Is there any pitfall in implementing only one of them? What happens when you only implement `__unicode__` and then do `str(obj)`? – RickyA Feb 06 '13 at 09:09
9

`unicode` raises a `NameError` on Python 3, is a simple pattern that works across both 2 and 3? – bradley.ayers Mar 24 '13 at 08:05
1

@bradley.ayers the [`future`](https://pypi.python.org/pypi/future) package also provides [`python_2_unicode_compatible`](http://python-future.org/compatible_idioms.html#custom-str-methods) without having Django as a dependency. – Kyle Pittman May 13 '16 at 13:22
2

It depends. Because python3 does not uses __unicode__ but instead __str__ ;) for python 2 __unicode__ – Eddwin Paz Oct 03 '17 at 01:04
1

@jfs the link is broken, here is a working link: [python_2_unicode_compatible](https://docs.djangoproject.com/en/1.11/_modules/django/utils/six/#python_2_unicode_compatible) – j-i-l Jun 27 '18 at 09:15
1

To ensure compatibility, **return `unicode(self).encode(locale.getpreferredendoding())`** instead. There are still plenty of systems that do not use 'utf8' as the default system encoding. – Alex Quinn Jun 26 '19 at 14:55

score 24 · Answer 2 · answered Aug 20 '09 at 16:04

If I didn't especially care about micro-optimizing stringification for a given class I'd always implement __unicode__ only, as it's more general. When I do care about such minute performance issues (which is the exception, not the rule), having __str__ only (when I can prove there never will be non-ASCII characters in the stringified output) or both (when both are possible), might help.

These I think are solid principles, but in practice it's very common to KNOW there will be nothing but ASCII characters without doing effort to prove it (e.g. the stringified form only has digits, punctuation, and maybe a short ASCII name;-) in which case it's quite typical to move on directly to the "just __str__" approach (but if a programming team I worked with proposed a local guideline to avoid that, I'd be +1 on the proposal, as it's easy to err in these matters AND "premature optimization is the root of all evil in programming";-).

In python 2.6.2, I recently got tripped up because instances of a particular built-in Exception subclass gave different results with str(e) and unicode(e). str(e) gave user-friendly output; unicode(e) gave different, user-unfriendly output. Is this considered buggy behavior? The class is UnicodeDecodeError; I didn't name it up front to avoid confusion -- the fact that the exception is unicode-related is not particularly relevant. — Paul Du Bois, Mar 14 '12 at 23:50

score 15 · Answer 3 · answered Aug 20 '09 at 16:00

15

With the world getting smaller, chances are that any string you encounter will contain Unicode eventually. So for any new apps, you should at least provide __unicode__(). Whether you also override __str__() is then just a matter of taste.

answered Aug 20 '09 at 16:00

Aaron Digulla

321,842
108
597
820

5

If you're writing Python 3 code, defining `__unicode__` does nothing. – Boris Verkhovskiy Sep 25 '20 at 15:42

score 9 · Answer 4 · answered Jun 18 '17 at 21:19

If you are working in both python2 and python3 in Django, I recommend the python_2_unicode_compatible decorator:

Django provides a simple way to define str() and unicode() methods that work on Python 2 and 3: you must define a str() method returning text and to apply the python_2_unicode_compatible() decorator.

As noted in earlier comments to another answer, some versions of future.utils also support this decorator. On my system, I needed to install a newer future module for python2 and install future for python3. After that, then here is a functional example:

#! /usr/bin/env python

from future.utils import python_2_unicode_compatible
from sys import version_info

@python_2_unicode_compatible
class SomeClass():
    def __str__(self):
        return "Called __str__"


if __name__ == "__main__":
    some_inst = SomeClass()
    print(some_inst)
    if (version_info > (3,0)):
        print("Python 3 does not support unicode()")
    else:
        print(unicode(some_inst))

Here is example output (where venv2/venv3 are virtualenv instances):

~/tmp$ ./venv3/bin/python3 demo_python_2_unicode_compatible.py 
Called __str__
Python 3 does not support unicode()

~/tmp$ ./venv2/bin/python2 demo_python_2_unicode_compatible.py 
Called __str__
Called __str__

score 4 · Answer 5 · answered Jun 26 '19 at 15:36

Python 2: Implement __str__() only, and return a unicode.

When __unicode__() is omitted and someone calls unicode(o) or u"%s"%o, Python calls o.__str__() and converts to unicode using the system encoding. (See documentation of __unicode__().)

The opposite is not true. If you implement __unicode__() but not __str__(), then when someone calls str(o) or "%s"%o, Python returns repr(o).

Rationale

Why would it work to return a unicode from __str__()?
If __str__() returns a unicode, Python automatically converts it to str using the system encoding.

What's the benefit?
① It frees you from worrying about what the system encoding is (i.e., locale.getpreferredencoeding(…)). Not only is that messy, personally, but I think it's something the system should take care of anyway. ② If you are careful, your code may come out cross-compatible with Python 3, in which __str__() returns unicode.

Isn't it deceptive to return a unicode from a function called __str__()?
A little. However, you might be already doing it. If you have from __future__ import unicode_literals at the top of your file, there's a good chance you're returning a unicode without even knowing it.

What about Python 3?
Python 3 does not use __unicode__(). However, if you implement __str__() so that it returns unicode under either Python 2 or Python 3, then that part of your code will be cross-compatible.

What if I want unicode(o) to be substantively different from str()?
Implement both __str__() (possibly returning str) and __unicode__(). I imagine this would be rare, but you might want substantively different output (e.g., ASCII versions of special characters, like ":)" for u"☺").

I realize some may find this controversial.

score 2 · Answer 6 · answered Mar 14 '19 at 22:55

It's worth pointing out to those unfamiliar with the __unicode__ function some of the default behaviors surrounding it back in Python 2.x, especially when defined side by side with __str__.

class A :
    def __init__(self) :
        self.x = 123
        self.y = 23.3

    #def __str__(self) :
    #    return "STR      {}      {}".format( self.x , self.y)
    def __unicode__(self) :
        return u"UNICODE  {}      {}".format( self.x , self.y)

a1 = A()
a2 = A()

print( "__repr__ checks")
print( a1 )
print( a2 )

print( "\n__str__ vs __unicode__ checks")
print( str( a1 ))
print( unicode(a1))
print( "{}".format( a1 ))
print( u"{}".format( a1 ))

yields the following console output...

__repr__ checks
<__main__.A instance at 0x103f063f8>
<__main__.A instance at 0x103f06440>

__str__ vs __unicode__ checks
<__main__.A instance at 0x103f063f8>
UNICODE 123      23.3
<__main__.A instance at 0x103f063f8>
UNICODE 123      23.3

Now when I uncomment out the __str__ method

__repr__ checks
STR      123      23.3
STR      123      23.3

__str__ vs __unicode__ checks
STR      123      23.3
UNICODE  123      23.3
STR      123      23.3
UNICODE  123      23.3

Python str versus unicode

6 Answers6

Linked

Python __str__ versus __unicode__

6 Answers6

Linked

Python str versus unicode