2

Running Python in a standard GNU terminal emulator on Ubuntu 14.04, I get the expected behavior when typing interactively:

>>> len('tiθ')
4
>>> len(u'tiθ')
3

The same thing happens when running an explicitly utf8-encoded script in Spyder:

# -*- coding: utf-8 -*-
print(len('tiθ'))
print(len(u'tiθ'))

...gives the following output, regardless of whether I run it in a new dedicated interpreter, or run in a Spyder-default interpreter (shown here):

>>> runfile('/home/dan/Desktop/mwe.py', wdir=r'/home/dan/Desktop')
4
3

But when typing interactively in a Python console within Spyder:

>>> len('tiθ')
4
>>> len(u'tiθ')
4

This issue has been brought up elsewhere, but that question regards differences between Windows and Linux. Here, I'm getting different results in different consoles on the same system, and the Python startup message in the terminal emulator and in the console within Spyder are identical:

Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

What is going on here, and how can I get Python-within-Spyder to behave like Python-in-the-shell with regard to unicode strings? @martijn-pieters makes the comment on this question that

Spyder does all sorts of things to break normal a Python environment.

But I'm hoping there's a way to un-break this particular feature, since it makes it really hard to debug scripts in the IDE when I can't rely on my interactive typed commands to yield the same results as scripts run as a whole with their coding: utf-8 declaration.

UPDATES

In the GNU terminal:

>>> repr(u'tiθ')
"u'ti\\u03b8'"
>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.getdefaultencoding()
'ascii'

In Spyder console:

>>> repr(u'tiθ')
"u'ti\\xce\\xb8'"
>>> import sys
>>> sys.stdin.encoding  # returns None
>>> sys.getdefaultencoding()
'UTF-8'

So knowing that, can I convince Spyder to behave like the GNU terminal?

Community
  • 1
  • 1
drammock
  • 2,373
  • 29
  • 40
  • What does `repr(u'tiθ')` produce? The issue here is with the *input* settings for that console. Your keyboard input produces bytes, not the source file, for Python to decode to Unicode text. – Martijn Pieters Oct 11 '14 at 08:08
  • In the interactive console, it appears you are using a UTF-8 terminal, and Python can detect that, so that `u'tiθ'` just works; the bytes are then not read from a regular file but from the terminal environment instead. – Martijn Pieters Oct 11 '14 at 08:10
  • @MartijnPieters see update re: the results of `repr()` – drammock Oct 11 '14 at 08:16
  • Your keyboard is producing UTF-8 (the `repr('tiθ')` will match in bytes) but Spyder thinks it is Latin-1 (or CP-1252, but you are on Linux, not Windows, so that is less likely). Not sure how to configure the console to change that. Most likely `import sys; sys.stdin.encoding` will confirm this. – Martijn Pieters Oct 11 '14 at 08:18
  • Indeed, `sys.stdin.encoding` returns `UTF-8` in the GNU terminal, but returns nothing within Spyder. Maybe I can set that value in a custom Spyder startup script... – drammock Oct 11 '14 at 08:22
  • I kind of doubt that, but you could try to set the `PYTHONIOENCODING` environment variable perhaps; that'd affect both input and output however. – Martijn Pieters Oct 11 '14 at 08:26
  • And thanks for confirming again that `sys.getdefaultencoding()` has been altered by Spyder. That's like tying a stick to your leg after breaking it and keeping on walking, rather than go to the emergency room and have it set properly. It'll fix Unicode problems in the short term, but *boy* is it going to hurt having to re-set the bone later on. – Martijn Pieters Oct 11 '14 at 08:28
  • A brief look at Unicode-related bug reports in the Spyder issue tracker doesn't give me much hope; the developers need to get their input/output and internal Unicode handling sorted out, as they have so far made a bit of a muddle of it. – Martijn Pieters Oct 11 '14 at 08:47
  • @MartijnPieters Could you suggest us what things we need to do to fix this situation? We are just a bunch of scientists trying to create a scientific friendly IDE, so this unicode/bytes strings issue is hard to understand and get it right for us. I really mean it, please give us some simple suggestions, at least to not have more questions like this one in SO :-) – Carlos Cordoba Oct 11 '14 at 15:35
  • 1
    @CarlosCordoba: Start by taking out the `sys.setdefaultencoding()` call; that has nothing to do with input and output, but does mask errors in people's code where they rely on implicit encoding / decoding. Next, you'll have to study up on how to open `sys.stdin` for your console with the same encoding used as the GUI input source; that's not something I can help with though. Perhaps the PyCharm community edition codebase holds clues as to how they tackle this. – Martijn Pieters Oct 11 '14 at 15:38
  • @CarlosCordoba: I believe you are currently setting a `# coding` comment in the console; that doesn't apply to the console however, only to source files read by the interpreter. – Martijn Pieters Oct 11 '14 at 15:39
  • @MartijnPieters thanks. I understand the `sys.setdefaultencoding()` and will remove it. By `# coding` comment, do you mean the one we have in our `sitecustomize`? I don't understand quite well though the `sys.stdin` suggestion, but I'll see what I can about it. Thanks a lot for your suggestions :-) – Carlos Cordoba Oct 11 '14 at 15:57
  • @CarlosCordoba: I didn't check versions; but yes, I saw that in a revision of your `sitecustomize` module. – Martijn Pieters Oct 11 '14 at 16:07

1 Answers1

0

After a bit of research, it seems that the strange behavior is at least in part built into Python (see this discussion thread; briefly, Python sets sys.stdin.encoding to None as default, and only changes it if it detects that the host is tty and it can detect the tty's encoding).

That said, a hackish workaround was just to tell Spyder to use /usr/bin/python3 as its executable instead of the default (which was Python 2.7.6). When running a Python 3 console within Spyder or within a GNU terminal emulator, I get results different (better!) than before, but importantly the results are consistent, regardless of whether running a script or typing interactively, and regardless of using GNU terminal or Spyder console:

>>> len('tiθ')
3
>>> len(u'tiθ')
3
>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> sys.getdefaultencoding()
'utf-8'

This leads to other problems within Spyder, however: its sitecustomize.py script is not Python 3 friendly, for example, so every new interpreter starts with

Error in sitecustomize; set PYTHONVERBOSE for traceback:
SyntaxError: invalid syntax (sitecustomize.py, line 432)

The interpreter seems to work okay anyway, but it makes me nervous enough that I'm not considering this an "acceptable" answer, so if anyone else has a better idea...

drammock
  • 2,373
  • 29
  • 40
  • What's your Spyder version? If it is less than `2.3.0`, you need to update it. This was the first version compatible with Python 3. I also say it because I'm not seeing anything strange in the current 432 line of our `sitecustomize.py` – Carlos Cordoba Oct 11 '14 at 15:39
  • I checked with the `sitecustomize` of our 2.2 series and now I'm almost sure you need to update :) – Carlos Cordoba Oct 11 '14 at 16:00
  • In Python 3, the `u` prefix doesn't mean anything. `u'tiθ'` is *exactly the same thing* as `'tiθ'`. But the default encoding used by Python 3 is UTF-8, so that helps in your case. – Martijn Pieters Oct 11 '14 at 16:22
  • @MartijnPieters: yes, I know about all strings being unicode strings in Python 3; I was just a bit lazy with my copy-pasting from earlier tests. – drammock Oct 12 '14 at 00:03
  • @CarlosCordoba: good to know that 2.3+ is Py3 compatible; I was on 2.2.5. But for what it's worth it would be *really* nice to have things work as expected under Py2 as well. – drammock Oct 12 '14 at 00:03
  • @drammock Absolutely! We're working right now to fix the issues your question brought to our attention. Take a loook at [this issue](https://code.google.com/p/spyderlib/issues/detail?id=2004) if you want to see the progress. – Carlos Cordoba Oct 12 '14 at 18:40