3

I'm having some trouble with Python's raw_input command (Python2.6), For some reason, the raw_input does not get the converted string that swedify() produces and this giving me a encoding error which i'm aware of, that's why i made swedify() to begin with. Here's what i'm trying to do:

elif cmd in ('help', 'hjälp', 'info'):
    buffert += 'Just nu är programmet relativt begränsat,\nDe funktioner du har att använda är:\n'
    buffert += ' * historik :: skriver ut all din historik\n'
    buffert += ' * ändra <något> :: ändrar något i databasen, följande finns att ändra:\n'
    print swedify(buffert)

This works just fine, it outputs the swedish characters just as i want them to the console. But when i try to (in the same code, with same \x?? values, print this piece:

core['goalDistance'] = raw_input(swedify('Hur långt i kilometer är ditt mål: '))
core['goalTime'] = raw_input(swedify('Vad är ditt mål i minuter att springa ' +  core['goalDistance'] + 'km på: '))

Then i get this:

C:\Users\Anon>python löp.py
Traceback (most recent call last):
  File "l÷p.py", line 92, in <module>
    core['goalDistance'] = raw_input(swedify('Hur långt i kilometer är ditt mål: '))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 5: ordinal not in range(128)

Now i've googled around, found some "solutions" but none of them work, some sad that i have to create a batch script that executes chcp ??? in the beginning, but that's not a clean solution IMO.

Here is swedify:

def swedify(inp):
    try:
        return inp.decode('utf-8')
    except:
        return '(!Dec:) ' + str(inp)

Any solutions on how to get raw_input to read my return value from swedify()? i've tried from encodings import getencoder, getdecoder and others but nothing for the better.

Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
Torxed
  • 22,866
  • 14
  • 82
  • 131
  • 1
    It works fine for me when I leave out `swedify` and just call `raw_input` on the Swedish prompt. – Ray Toal Sep 06 '11 at 06:29
  • @Ray-Toal which python version are you using? also, do you mean when you do like this: raw_input('Hur långt i kilometer är ditt mål: ') because that works for me as well but then i get malformed characters in the console depending on what machine i'm running my code on, and i'm trying to find a universal way to output å ä ö into a console (with different operating systems, languages on them and localizations). – Torxed Sep 06 '11 at 06:39
  • I tested removing the swedify calls also and its works for the raw_input. Note on my machine I had to add this at the begining to get python parsing the script correctly: `# coding=utf-8` may be it can help others. – Lynch Sep 06 '11 at 06:45
  • @Torxed Python 2.7.1. But it is on a Mac and I do not have codepage issues as my terminal is set to UTF-8. See Lynch's comment and try the coding=utf-8 declaration. – Ray Toal Sep 06 '11 at 06:49
  • @Torxed I don't think there is a _universal way_ to get consoles to display properly because consoles are native apps. I could be wrong though. In a web browser, using HTML, you can show the character `å` on all browsers with `å` and this works even if end users trick their browser into using a different encoding than the one sent by the server. But this is a hack similar to chcp which you rightly want to avoid. – Ray Toal Sep 06 '11 at 06:53

6 Answers6

3

For me it worked fine with:

#-*- coding: utf-8 -*-
import sys
import codecs
koden=sys.stdin.encoding

a=raw_input( u'Frågan är öppen? '.encode(koden))
print a

Per

Per Persson
  • 155
  • 1
  • 1
  • 6
3

You mention the fact that you received an encoding error which motivated you to write swedify in the first place, and you have found solutions around chcp which is a Windows command.

On *nix systems with UTF-8 terminals, swedify is not necessary:

>>> raw_input('Hur långt i kilometer är ditt mål: ')
Hur långt i kilometer är ditt mål: 100
'100'
>>> a = raw_input('Hur långt i kilometer är ditt mål: ')
Hur långt i kilometer är ditt mål: 200
>>> a
'200'

FWIW, when I do use swedify, I get the same error you do:

>>> def swedify(inp):
...     try:
...         return inp.decode('utf-8')
...     except:
...         return '(!Dec:) ' + str(inp)
... 
>>> swedify('Hur långt i kilometer är ditt mål: ') 
u'Hur l\xe5ngt i kilometer \xe4r ditt m\xe5l: '
>>> raw_input(swedify('Hur långt i kilometer är ditt mål: '))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 5: ordinal not in range(128)

Your swedify function returns a unicode object. The built-in raw_input is just not happy with unicode objects.

>>> raw_input("å")
åeee
'eee'
>>> raw_input(u"å")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 0: ordinal not in range(128)

You might want to try this in Python 3. See this Python bug.

Also of interest: How to read Unicode input and compare Unicode strings in Python?.

UPDATE According to this blog post there is a way to set the system's default encoding. This might be worth a try.

Community
  • 1
  • 1
Ray Toal
  • 86,166
  • 18
  • 182
  • 232
  • Correct, on a *nix system this would be useless, since my friends are not as enlightened as us lucky ones, they're using Windows 7 with different language packs and "default languages" which makes it tricky to get a good overall solution without 100 workarounds. As you mentioned, it does not take unicode strings which i probably should have figured out which i sort of did because i just moved the swedify() part out of the way and printed it along side with the raw_input which wasn't all to pritty but it works. raw_input(u'åäö>'.encode('iso-8859-15')) works sort of, gives odd letters tho. – Torxed Sep 06 '11 at 06:57
  • You should still be able to get things to work because Windows 7 should support UTF-8 for its console app. Remember that Python's `raw_input` uses the encoding of `sys.stdin` so if you can force that encoding to be UTF-8, and do the same for `sys.stdout`, will it work? Sorry I don't have a Windows 7 box to test this on. – Ray Toal Sep 06 '11 at 07:02
  • That will work, i remember seeing a solution where they used decode(encode(u'...')) with 'replace' some how, but i can't find it, but i know this solved a lot of problems. But forcing stdin will work yes so i'll mark the post as a solution, Windows is a work-around no matter what :) Cheers Ray! – Torxed Sep 06 '11 at 07:16
  • @RayToal, the Windows console does **not** support UTF-8. There's a codepage that looks like it support UTF-8 but it's broken beyond belief and causes all kind of issues, especially around reading multi-byte input. – Alastair McCormack Dec 26 '15 at 12:35
  • Good to know. But it is hard to believe that one of the world's most popular operating systems chose to have a native terminal (console) application that does not deal with what is arguably the world's most popular encoding of Unicode. So the company behind the O.S. is fine to just leave "console support" to volunteers in the open source community to build support over the Console API? (If so, that strikes me as an example of trurth being stranger than fiction :) ) – Ray Toal Dec 26 '15 at 18:16
2

On Windows, the console's native Unicode support is broken. Even the apparent UTF-8 codepage isn't a proper fix.

To read and write with Windows console you need use https://github.com/Drekin/win-unicode-console, which works directly with the underlying console API, so that multi-byte characters are read and written correctly.

Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
0

Windows command prompt uses Codepage 850 when using Swedish regional settings (https://en.wikipedia.org/wiki/Code_page_850). It's probably used because of backwards compatibility with old MS-Dos programs.

You can set Windows command prompt to use UTF-8 as encoding by entering: chcp 65001 (Unicode characters in Windows command line - how?)

Community
  • 1
  • 1
-1

Try this magic comment at the very top of your script:

# -*- coding: utf-8 -*-

Here is some information about it: http://www.python.org/dev/peps/pep-0263/

Fabian
  • 4,160
  • 20
  • 32
  • 1
    Just for the record, that doesn't help all to much. It only tells which encoding is expected within the file, it will not manage the actual output or input from say a socket or raw_input. – Torxed Sep 07 '11 at 13:35
-1

Solution to a lot of problems:


Edit: C:\Python??\Lib\Site.py Replace "del sys.setdefaultencoding" with "pass"

Then,
Put this in the top of your code:

sys.setdefaultencoding('latin-1')

The holy grail of fixing the Swedish/non-UTF8 compatible characters.

Torxed
  • 22,866
  • 14
  • 82
  • 131