Python doctests and unicode

Question

I have a code base that runs unchanged in Python 2.7 and 3.2+. But the doctests in the documentation rst files are giving me a headache. When I run them in Python2, I get UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 16: ordinal not in range(128). If I add

.. testsetup:: *

   from __future__ import unicode_literals

then I get a lot of errors like

Expected:
    'something'
Got:
    u'something'

Is there a way to have doctest containing unicode characters in the rst files that work unchanged in Python 2.7 and 3.2+?

Is it much work to actually specify the encoding to use when you run them? — Paulo Bu, Jul 03 '13 at 03:24
Well, I don't see code so it would be hard but, do you use the function `doctest.testfile` somewhere in your code? If so, please add to the question. — Paulo Bu, Jul 03 '13 at 03:28
I don't. Doctests are called when I build the documentation using sphinx (`make doctest`). I get a similar error in the pypi registering. I was hoping to find a solution based on changing the rst file, not the call. In — Hernan, Jul 03 '13 at 03:41
Please add the code that gives you problems. And a traceback would be informative as well. — Lennart Regebro, Jul 03 '13 at 04:34
possible duplicate of [Doctest fails due to unicode leading u](http://stackoverflow.com/questions/31243623/doctest-fails-due-to-unicode-leading-u) — Kurt Bourbaki, Jul 18 '15 at 13:44

score 1 · Answer 1 · answered Jul 03 '13 at 06:49

1

Make sure you are using Python 3.3. It added the explicit u'unicode literals' -- i.e. with u prefix again -- to ease the transition between Python 2 with unicode literals and Python 3. See http://docs.python.org/3/whatsnew/3.3.html#pep-414-explicit-unicode-literals

answered Jul 03 '13 at 06:49

pepr

20,112
15
76
139

For me, adding the `u'some string'` prefix makes the doctest fail even with Python 3.5 – matth Feb 10 '17 at 11:54
@matth The ignored `u` prefix for string literals is still valid for both for Python 3.5 and Python 3.6 (see https://docs.python.org/3.5/reference/lexical_analysis.html#grammar-token-stringliteral). Can you create a minimalistic example that shows the problem? – pepr Feb 10 '17 at 12:52
I asked a new question here, it includes an example: http://stackoverflow.com/questions/42158733/unicode-literals-and-doctest-in-python-2-7-and-python-3-5 – matth Feb 10 '17 at 12:53
also see http://stackoverflow.com/questions/13473971/multi-version-support-for-python-doctests#comment18432670_13473971 – matth Feb 10 '17 at 12:54
1

As Martijn Pieters noted below the later question, there is no simple solution to bend the `doctest`. The reason is that the `u'string literal'` was reintroduced to Python 3.3 only to reuse the old sources easily. It is a compromise. The `u` prefix is simply ignored. This way, the string value will be the same if you use the prefix or not. On the other hand, the string representation will always be without the `u` prefix. The `doctest` is too simple to solve that compromise without any additional work. It is based on comparing of the string representations of the results. – pepr Feb 10 '17 at 14:05

Python doctests and unicode

1 Answers1

Linked