What is a good strategy when it comes to Python I/O and user input from command prompt?

Question

I am freaking out lately because I have spent a week writing a totally useless pyton module that transforms some spacial data to a .csv format.

I got not problemwith handling the spatial data but when the software runs I ask the user to submit some input from the command prompt or cygwin. After a lot of effort and googling I got it somehow to work with UTF-8.

I made the compromise to use only english language and not (greek) that I needed but now I get errors about english! Take a look at the error:

Please respond with 'yes' or 'no' or 'y' or 'n').
Would you like to add trips to the route with id ''no5leho'' and direction 0?
[y/n] y
Traceback (most recent call last):
File "main.py", line 296, in <module>
inputAddTrips = query_yes_no('Would you like to add trips to the route with id \'\'%s\'\' and direction 0?\r\n' % (i))
File "main.py", line 33, in query_yes_no
choice = input().lower()
File "C:\Python34\lib\codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte

I have tried all sorts of try like setting PYTHONIOENCODING system variable to utf-8 and .encoding decoding every single print and input().

I have used

#!/usr/bin/env python

and

# -*- coding: utf-8 -*-

But nothing happens? I still get those errors! So I want to ask what you guys do when it comes to input and output? I am a begineer but I know that big websites and software are made in Python so there must be a way to aboid all those errors!!

score 0 · Answer 1 · edited May 23 '17 at 12:14

0

It's not about Greek or English. It's about encoding in general. If a user submits something encoding comes from the system. Most likely it is neither UTF-8, nor ASCII. Your error is "invalid continuation byte" that indicates ISO-8859-1. Maybe this tread will be helpful? UnicodeDecodeError, invalid continuation byte

edited May 23 '17 at 12:14

Community

1
1

answered Jun 29 '15 at 17:26

Alex Ivanov

695
4
6

In my case the user submits data from command prompt or cygwin. Since I have changed character encoding in both to be utf-8 is there any other place where something changes the encoding? – dimrizo Jun 29 '15 at 17:39
From the system locale, I guess. – Alex Ivanov Jun 29 '15 at 18:17
system locale was allways at the right place.Anyway , thanks – dimrizo Jun 29 '15 at 21:22

score 0 · Answer 2 · edited May 23 '17 at 12:22

0

From python: how to convert a string to utf-8, you could convert to unicode and specify the encoding as utf-8 and, failing that, you could tell python to ignore portions of a string that it can't convert to utf-8 with some basic error handling.

edited May 23 '17 at 12:22

Community

1
1

answered Jun 29 '15 at 19:55

Andrew Winterbotham

1,000
7
13

well what happens to the ignored errors? they are not shown at all? I don't want this. I am so dissapointed with python not being able to handle that basic stuff. This is basic I/O stuff. – dimrizo Jun 29 '15 at 21:25
I'm not quite sure what you mean, I'd really have to see more than what you have provided, and I don't know what kind of data you are getting in. However, to find out what kind of encoding you are dealing with, you can use [chardet](https://pypi.python.org/pypi/chardet). Let' say the encoding is latin-1, then the following might work: `s.decode('latin-1').encode("utf-8")`, where s is the input you are dealing with. See [here](http://stackoverflow.com/questions/9644099/python-ascii-codec-cant-decode-byte) for more details. – Andrew Winterbotham Jun 29 '15 at 21:54
I am ok with the "weid" data. I have a problem when I enter names from the command prompt! Both command prompt and python seem to use utf-8 but I get this error above! – dimrizo Jun 30 '15 at 07:17
I meant to say weird* data – dimrizo Jun 30 '15 at 07:24
That's annoying alright! The above post seems helpful though, and as a Linux user I can't be of any more help in dealing with problems related to Windows. You should definitely check out the following link too, it's quite useful: [http://www.joelonsoftware.com/articles/Unicode.html](http://www.joelonsoftware.com/articles/Unicode.html) – Andrew Winterbotham Jun 30 '15 at 09:38
No problem, hope you get the problem sorted! – Andrew Winterbotham Jun 30 '15 at 10:02

score 0 · Answer 3 · answered Jun 30 '15 at 08:21

Don't mess with PYTHONIOENCODING. It's for making Python output a particular encoding ignoring what the console actually supports and is used when using the command shell to redirect Python output to a file in a particular encoding.

Windows consoles don't do UTF-8 well. Since you wanted Greek, what is your code page? Code page 737 is a Greek encoding. You also need a console font that supports Greek characters. I'm using Consolas font.

I'm on US Windows which defaults to code page 437. Switch to Greek:

C:\>chcp 737
Active code page: 737

Display all the characters supported by the code page:

C:\>py
Python 3.3.5 (v3.3.5:62cf4e77f785, Mar  9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> bytes(range(256)).decode('cp737')
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\
x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7fΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠ
ΡΣΤΥΦΧΨΩαβγδεζηθικλμνξοπρσςτυφχψ░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀ωάέήϊίόύϋώΆΈΉΊΌΎΏ±≥≤ΪΫ÷≈°∙·√ⁿ²■\xa0'

Ask for some input using Greek characters. I just cut-and-pasted some supported characters, but if your Windows is configured for Greek you should be able to type directly:

>>> input('Greek? ')
Greek? ΡΣΤΥΦΧΨΩαβγδεζηθ
'ΡΣΤΥΦΧΨΩαβγδεζηθ'
>>>

Another option is to skip using the Windows console and get a decent Python IDE that supports UTF-8.

hey thanks for answering! Aren't greek included in utf-8? If yes, why should I use 737 ? What IDE do you suggest? Some say komodo some eclipse with a a plugin! — dimrizo, Jun 30 '15 at 09:26
Yes, UTF8 contains all of Unicode, but it is the Windows console that doesn't support it well. It's code page 65001 if you want to try it. — Mark Tolonen, Jun 30 '15 at 14:16
I get the second option but what is the first option you suggest? — dimrizo, Jun 30 '15 at 14:27
You've not given enough information to see how you get those errors. Whatever you have done to use UTF-8 isn't correct. Update your question with the code and steps to reproduce your problem and then I can suggest how to use the first option. — Mark Tolonen, Jun 30 '15 at 16:11

What is a good strategy when it comes to Python I/O and user input from command prompt?

3 Answers3