17

I noticed the following holds:

>>> u'abc' == 'abc'
True
>>> 'abc' == u'abc'
True

Will this always be true or could it possibly depend on the system locale? (It seems strings are unicode in python 3: e.g. this question, but bytes in 2.x)

smci
  • 32,567
  • 20
  • 113
  • 146
doctorlove
  • 18,872
  • 2
  • 46
  • 62

1 Answers1

14

Python 2 coerces between unicode and str using the ASCII codec when comparing the two types. So yes, this is always true.

That is to say, unless you mess up your Python installation and use sys.setdefaultencoding() to change that default. You cannot do that normally, because the sys.setdefaultencoding() function is deleted from the module at start-up time, but there is a Cargo Cult going around where people use reload(sys) to reinstate that function and change the default encoding to something else to try and fix implicit encoding and decoding problems. This is a dumb thing to do for precisely this reason.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    What's exactly is wrong with `sys.setdefaultencoding()`? – anatoly techtonik Apr 10 '15 at 10:42
  • 1
    @techtonik: changing the system default can break packages that rely on the default to be ASCII, and changing it only *masks* the issues your code has with relying on implicit encoding and decoding. If you were to set it to Latin-1 all byte-unicode decodings magically work but won't actually make sense, if you set it to UTF-8 all unicode-byte encodings will work but may not make sense, etc. You are effectively pre-spalking up your leg in case it breaks, rather than avoid breaking your leg in the first place. – Martijn Pieters Apr 10 '15 at 11:23
  • Looks like a [workflow](https://xkcd.com/1172/) problem to me. Is there a more real/explicit example? – anatoly techtonik Apr 10 '15 at 12:43
  • 2
    @techtonik: I fail to see how this is a obscure side effect that some users want to maintain. That's frankly a ridiculous over-simplification of the issues. See [Dangers of sys.setdefaultencoding('utf-8')](https://stackoverflow.com/a/29561747) for a concrete example. – Martijn Pieters Apr 10 '15 at 12:47
  • @techtonik: if you are classifying this as a XKCD workflow issue then we may as well switch Python to be a fully weakly typed language such as JavaScript. – Martijn Pieters Apr 10 '15 at 12:48
  • 2
    @techtonik I don’t get why you think that it’s a workflow issue when packages rely on the sane default value of something that cannot be changed in a non-hackish way. If anything, you trying to get around it just to change it so you get a “fixed behavior” (for your particular issues—or workflow) is the real workflow problem. – poke Apr 10 '15 at 13:06
  • Well, it is not a workflow issue, but rather a FUD issue, because it is hard (if not impossible) to find the explanation and examples that illustrates the bad behaviour and aforementioned package conflict. The workflow slide, actually, contains a good detailed explanation of the issue that person has, and this one is very abstract. So, yes, I basically hoping for good answers to https://stackoverflow.com/a/29561747 to appear. – anatoly techtonik Apr 10 '15 at 13:31
  • @anatolytechtonik see https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/ – Alastair McCormack Jan 05 '17 at 09:17