I'm setting up pytest for a flask app. In one of my tests, I make an assertion on a returned JSON data structure.
res = flask_app.get("/api/list_databases") # type: flask.wrappers.Response
assert res.json["status"] is "success"
Note that status
does not refer to the HTTP status code in this context. It's an application specific status attribute.
This assertion fails when I run the test.
AssertionError: assert 'success' is 'success'
I know I am using reference equality testing here, which is not strictly necessary, but this error got me very curious. As in, how is this possible?
If I do id(x)
on both, I see they have different object id's. They are both instance of str
(using type(x)
).
But from my (limited) understanding of Python, the following applies:
- All strings are made up of unicode codepoints. Before they become strings (e.g. when they are read from disk or network) they are
bytes
and must be parsed with a specified (or default?) character encoding to becomestr
instances. - As a result, once initialized, a string will be made up of unicode codepoints in whatever internal form the Python interpreter deems useful. This is different from Ruby, where a string exists together with encoding meta data. As such, you can have both ISO 8859-1 strings and UTF-8 strings side by side.
- Seeing as strings are "normalized" in this way, it is impossible for the string
føøbar
to have two different byte representations within the Python interpreter, even if it's read from two different text files, with different encodings. - When the byte representation cannot differ, that means these two strings are backed by the exact same sequence of bytes.
- Strings in Python are immutable.
Because of this, the Python interpreter will not create multiple instances of the same string. Instead, new references will point to the firstThis is wrong. See answers. Strings may be interned in some cases, but that's a CPython optimization, not part of the language specification.str
object.
Empirical evidence:
Python 3.5.3 (default, Apr 10 2018, 21:11:57)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = "foobar"
>>> b = "foobar"
>>> id(a)
4487164008
>>> id(b)
4487164008
Which ultimately begs the question:
How is it possible to have two string objects (not references) with the same value?