2

I'm setting up pytest for a flask app. In one of my tests, I make an assertion on a returned JSON data structure.

res = flask_app.get("/api/list_databases") # type: flask.wrappers.Response
assert res.json["status"] is "success"

Note that status does not refer to the HTTP status code in this context. It's an application specific status attribute.

This assertion fails when I run the test.

AssertionError: assert 'success' is 'success'

I know I am using reference equality testing here, which is not strictly necessary, but this error got me very curious. As in, how is this possible?

If I do id(x) on both, I see they have different object id's. They are both instance of str (using type(x)).

But from my (limited) understanding of Python, the following applies:

  • All strings are made up of unicode codepoints. Before they become strings (e.g. when they are read from disk or network) they are bytes and must be parsed with a specified (or default?) character encoding to become str instances.
  • As a result, once initialized, a string will be made up of unicode codepoints in whatever internal form the Python interpreter deems useful. This is different from Ruby, where a string exists together with encoding meta data. As such, you can have both ISO 8859-1 strings and UTF-8 strings side by side.
  • Seeing as strings are "normalized" in this way, it is impossible for the string føøbar to have two different byte representations within the Python interpreter, even if it's read from two different text files, with different encodings.
  • When the byte representation cannot differ, that means these two strings are backed by the exact same sequence of bytes.
  • Strings in Python are immutable.
  • Because of this, the Python interpreter will not create multiple instances of the same string. Instead, new references will point to the first str object. This is wrong. See answers. Strings may be interned in some cases, but that's a CPython optimization, not part of the language specification.

Empirical evidence:

Python 3.5.3 (default, Apr 10 2018, 21:11:57)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = "foobar"
>>> b = "foobar"
>>> id(a)
4487164008
>>> id(b)
4487164008

Which ultimately begs the question:

How is it possible to have two string objects (not references) with the same value?

Niels B.
  • 5,912
  • 3
  • 24
  • 44
  • 2
    "Because of this, the Python interpreter will not create multiple instances of the same string." — That's where you're wrong. This point does not follow from the previous points, and is not, in fact, generally true. – khelwood Aug 02 '18 at 15:13
  • Your last bullet point is false. CPython will indeed intern some strings, but that's an implementation detail. In general, you have to assume that this is not true. Compare equality with `==`, not `is`. – L3viathan Aug 02 '18 at 15:13
  • In this case, I am running the test with CPython, so my question is - why do I observe one behaviour the the interactive shell and another in my test. – Niels B. Aug 02 '18 at 15:15
  • `is` tests object identity, not equality. Use `==` instead. – user4815162342 Aug 02 '18 at 15:16
  • I know, it already says in my question. My curiosity is related to the interning of strings. – Niels B. Aug 02 '18 at 15:18
  • It's strange however that python does *not* intern this particular instance as `"success"` matches the rules for interning strings. – Wombatz Aug 02 '18 at 15:27
  • @Wombatz i do not think this qualifies as the json construction under flask surely rebuilds the string – modesitt Aug 02 '18 at 15:28

1 Answers1

2

This is a good question. Consider the following construction

>>> a = 'hello'
>>> b = ''.join(['h', 'e', 'l', 'l', 'o'])
>>> a == b
True
>>> a is b
False

Your example only works because python notices that the strings are interned. The python language does not guarantee this, the cpython implementation simply does it. Your question of why and when these strings are interned are described fully in that link. Use == for your assert - and for all times where you are checking the equality of objects.

modesitt
  • 7,052
  • 2
  • 34
  • 64
  • Thanks - can you please upvote the question, since somebody thinks it's a stupid/bad question. – Niels B. Aug 02 '18 at 15:19
  • I did @NielsB. You clearly have a good understanding of why you should use `==` and just wanted to know why you cant use `is`/ – modesitt Aug 02 '18 at 15:20
  • Well, I'm giving a short answer: `is` equality is not guaranteed by Python itself but is only an implement optimization in C. – Sraw Aug 02 '18 at 15:25
  • But I will still not upvote this question as this is a duplication of https://stackoverflow.com/questions/1504717/why-does-comparing-strings-in-python-using-either-or-is-sometimes-produce – Sraw Aug 02 '18 at 15:27