Defining unicode variables in Python

Question

Recently, I have been reading about the Python source code encoding, especially PEP 263 and PEP 3120.

I have the following code:

# coding:utf-8

s = 'abc∂´ƒ©'
ƒ = 'My name is'
ß = '˚ß˙ˆ†ˆ∆ ßå®åø©ˆ'
print('s =', s)
print('ƒ =', ƒ, 'ß =', ß)

This code works fine for Python3 but results in a SyntaxError in Python2.7 .
I do understand that this probably might have nothing to do with source code encoding.
So, I would like to know if there is a way to support Unicode variable names in Python2.

In all, I am also having a hard time figuring out what pragmatic problem the PEPs exactly aim to solve and how(and where) do I take advantage of the proposed solutions. I have read few discussions on the same but they do not present an answer to my question rather an explanation of the correct syntax:

score 8 · Accepted Answer · answered Sep 01 '17 at 13:42

8

No, Python 2 only supports ASCII names. From the language reference:

identifier ::=  (letter|”_”) (letter | digit | “_”)*
letter     ::=  lowercase | uppercase
lowercase  ::=  “a”…”z”
uppercase  ::=  “A”…”Z”
digit      ::=  “0”…”9”

Compared that the much longer Python 3 version, which does have full Unicode names.

The practical problem the PEPs solve is that before, if a byte over 127 appeared in a source file (say inside a unicode string), then Python had no way of knowing which character was meant by that as it could have been any encoding. Now it's interpreted as UTF-8 by default, and can be changed by adding such a header.

answered Sep 01 '17 at 13:42

RemcoGerlich

30,470
6
61
79

I am sorry but I am unable to understand the meaning of "a byte over 127"? Do you mean to say that the ASCII code of a character is over 127? – Kshitij Saraogi Sep 01 '17 at 13:46
Yes. ASCII defines the meanings of bytes 0 to 127. Almost all encodings you'll see encode those values the same as ASCII. But values over 127 are not ASCII and are usually completely different characters in different encodings. – RemcoGerlich Sep 01 '17 at 13:48
1

This is the classic article: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ . – RemcoGerlich Sep 01 '17 at 13:49

fmedv · Answer 2 · 2017-09-04T07:55:42.103

1

I don't think that those two articles are about encoding in the sense of your variable name being a Beta-symbol for example, but regarding the encoding in the variable value.

so if you change your code to this example:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

a = 'abc?´ƒ©'
b = 'My name is'
c = '°ß?ˆ†ˆ? ßå®åø©ˆ'
print 'a =', a # by the way, the brackets are only used in python 3, so they are also being displayed when running the code in python 2.7
print 'b =', b, 'c =', c

Hope that answers your question

Greetings Frame

edited Sep 04 '17 at 07:55

answered Sep 01 '17 at 13:42

fmedv

153
1
2
12

This would be a hack around the problem rather than a solution. BTW, my problem here is interoperability between Python2 and Python3. – Kshitij Saraogi Sep 01 '17 at 13:45
3

@KshitijSaraogi you can't expect perfect interoperability between the versions, there are things you can do in Python 3 that you simply can't do in Python 2. Special characters for variable names is one of those things. – Mark Ransom Sep 01 '17 at 13:50

Defining unicode variables in Python

2 Answers2

Linked

Related