Does str type use ASCII encoder/decoder?

Question

On Python 2 REPL:

>>> sys.stdin.encoding 
'UTF-8'

So my understanding is, on giving the below expression on stdin

>>> stringLiteral = 'abc'

the interpreter reads the expression from stdin in utf-8 encoding and interprets the code.

But I learnt that, in Python 2, str type stores 'abc' as a byte string, and internally in CPython it's stored as a C char * null-terminated string (i.e. an array of bytes terminated by \0).

What encoding scheme does the str class use to store 'abc' in memory? What decoding scheme does str use to print 'abc' on printing it?

Based on the answer, If I give the below expression:

>>> stringLiteralNonAsciiRange = 'abc정정'

then why does stringLiteralNonAsciiRange not print 정정? Why is the output 'abc\xec\xa0\x95\xec\xa0\x95\xf0\x9f\x92\x9b'?

Python 2 interprets string literals as `ASCII` bytes. `sys.stdin.encoding` is irrelevant, since a literal is not taken from `stdin` — juanpa.arrivillaga, Jun 06 '17 at 20:01
1. ASCII by default, unless you specify unicode (the `u` prepended to the string will be an indicator). 2. Try `print repr(stringLiteralNonAsciiRange)`. — cs95, Jun 06 '17 at 20:02
Just typing the name of a variable is NOT the same as printing it - it's more like `print repr(variable)`. The `repr` of a string uses escape sequences for all non-ASCII and non-printable characters, so that you can see exactly what's in the string. — jasonharper, Jun 06 '17 at 20:04
@erip My understanding is, python 3's `bytes` type memory representation should be simialr to python 2's `str` type — overexchange, Jun 06 '17 at 20:04
You'll also note `'\xec\xa0\x95\xec\xa0\x95\xf0\x9f\x92\x9b'.decode('utf8')` gives `u'\uc815\uc815\U0001f49b'` which *is* `"정정"` — juanpa.arrivillaga, Jun 06 '17 at 20:08
@juanpa.arrivillaga Why `abc정정'.decode('utf-8')` gives `u'abc\uc815\uc815\U0001f49b'` but not `abc정정`? which is still not clear to me. — overexchange, Jun 06 '17 at 20:13
There's a [difference](https://stackoverflow.com/questions/1436703/difference-between-str-and-repr-in-python) between an object's `__repr__` method and an object's `__str__` method. — erip, Jun 06 '17 at 20:14
@eriq OK. So, `decode()` output in my previous comment has nothing to do with encoding/decoding. Thankyou — overexchange, Jun 06 '17 at 20:17
@Shiva Is ascii decoder used in both cases? `print repr(stringLiteral)` and `print repr(stringLiteralNonAsciiRange)`, as they are byte strings and nothing more than that — overexchange, Jun 06 '17 at 20:26
Yes, in the latter case, ASCII doesn't recognise those characters, so it is printed as is (`abc\xec\xa0\x95\xec\xa0\x95\xf0\x9f\x92\x9b`). — cs95, Jun 06 '17 at 20:32
@Shiva If it was ascii encoding scheme used to store `stringLiteralNonAsciiRange`, then `stringLiteralNonAsciiRange.decode('ascii')` should not fail. But it fails. So it contradicts here. — overexchange, Jun 06 '17 at 22:14

Does str type use ASCII encoder/decoder?

0 Answers0