1
# -*- coding: utf-8 -*-

I understand that this line of code is necessary when non-ascii characters are involved in python script file.

When I was learning python, I was told that the two ways of running python code (line by line in interpreter vs run a script file) would yield the same result. And they actually do, in most cases. But when non-ascii characters involved in scripts, it turns out that I have to declare encoding first.

Moreover, I have tried exec() function, trying to execute a string containing python codes.

>>> exec ("b='你'")

it works.

But if I save "b = '你'" to a script and run it, I will get syntax error.

I am curious about why I don't need to declare encoding when running python codes line by line in interpreter.

Is there any difference in executing procedures of these two way?

Thank you.

Richard Dally
  • 1,432
  • 2
  • 21
  • 38
altria
  • 53
  • 5

2 Answers2

0

Because standard in already has an encoding (sys.stdin.encoding).

The encoding of stdin can come from various sources depending on platform. On Apple and Windows its predefined ("utf-8" for Apple and "mbcs" for windows), otherwise it's determined from the current locale as given by LC_ALL or LANG (if LC_ALL is missing) environment variable.

If you're running under linux you can for example do run LC_ALL=en_GB.ascii python and your example should fail.

skyking
  • 13,817
  • 1
  • 35
  • 57
0

I suppose an interactive session of Python use the system encoding (see also Python Unicode strings and the Python interactive interpreter).

When it read a source file, it need to know how to interpret the data it's parsing It's logical the script is not necessary written in the same encoding than the terminal executing it; and even more when the script is ran without an environment specifying the encoding.

Also, it is interesting to read Joel Spolsky The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets which might explain why Python choose to require the developers to be explicit about the encoding (which I would have preferred that they set a default to UTF8, but their way is coherent with the Zen of Python).

Community
  • 1
  • 1
bufh
  • 3,153
  • 31
  • 36
  • No, the encoding `sys.getdefaultencoding()` returns is not the same as is used when parsing in interactive mode. For example I get `"ascii"`, but are still able to `print "ä"`. – skyking Aug 03 '15 at 09:00
  • Oops, you are right, i mixed system and default encoding while writing the answer; thank you for pointing that out. – bufh Aug 03 '15 at 09:11