5

I am trying to compare to strings in Python and noticed that when a dash/hyphen is present in the string it will not equate identical strings. For example:

>>>teststring = 'newstring'
>>>teststring is 'newstring'
True

Then, if I add a dash

>>>teststring = 'new-string'
>>>teststring is 'new-string'
False

Why is that the case, and what would be the best way to compare strings with dashes?

Cœur
  • 37,241
  • 25
  • 195
  • 267
samuelschaefer
  • 614
  • 2
  • 10
  • 26

2 Answers2

4

you should never use is to compare equality anyway. is tests for identity. Use ==.

Frankly I don't know why 'newstring' is 'newstring'. I'm sure it varies based on your Python implementation as it seems like a memory-saving cache to re-use short strings.

However:

teststring = 'newstring'
teststring == 'newstring' # True

nextstring = 'new-string'
nextstring == 'new-string' # True

basically all is does is test ids to make sure they're identical.

id('new-string') # 48441808
id('new-string') # 48435352
# These change
id('newstring') # 48441728
id('newstring') # 48441728
# These don't, and I don't know why.
Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • 2
    See [About the changing id of a Python immutable string](http://stackoverflow.com/a/24245514) for why `is` works *sometimes*. – Martijn Pieters Jun 17 '14 at 18:03
  • 3
    From my answer there: *[T]he Python compiler will also intern any Python string stored as a constant, provided it is a valid identifier. The Python code object factory function PyCode_New will intern any string object that contains only letters, digits or an underscore*. – Martijn Pieters Jun 17 '14 at 18:05
  • Here's a deeper dive in what get's interned by default: http://guilload.com/python-string-interning/ – Ray Jun 06 '15 at 20:18
0

You should not use is for string comparison. Is checks if both objects are same. You should use equality operator == here. That compares the values of objects, rather than ids of objects.

In this case, looks like Python is doing some object optimizations for string objects and hence the behavior.

>>> teststring = 'newstring'
>>> id(teststring)
4329009776
>>> id('newstring')
4329009776
>>> teststring = 'new-string'
>>> id(teststring)
4329009840
>>> id('new-string')
4329009776
>>> teststring == 'new-string'
True
>>> teststring is 'new-string'
False
ronakg
  • 4,038
  • 21
  • 46
  • 1
    See [About the changing id of a Python immutable string](http://stackoverflow.com/a/24245514) about when Python interns strings (and identity tests work). – Martijn Pieters Jun 17 '14 at 18:04
  • Makes sense. So this is similar to what python does with 0-255 integer objects, which are always present in the memory all the time. Python never creates new objects for these ints, just adds ref counts as and when needed. – ronakg Jun 17 '14 at 18:07
  • 1
    Indeed. It is an implementation detail however, not something your code should rely on. – Martijn Pieters Jun 17 '14 at 18:08