2

Here is the example:

>>> first_string = str('This_is_some_how_cached')
>>> second_string = str('This_is_some_how_cached')
>>> id(first_string) == id(second_string)
True
>>> first_string = str('This_is_new_string')
>>> second_string
'This_is_some_how_cached'
>>>

In the above example, first_string and second_string are created differently but they got the same id which means they are pointing to the same reference ? If yes when i change first_string to some new string doesn't update the second_string. Is this python __new__ method in the string class is behaving a kind of caching for small string or ?

Could some one please explain ?

James Sapam
  • 16,036
  • 12
  • 50
  • 73
  • 1
    You are asking two questions: One about string caching, and one about reassigning variable names. Please clarify what your real question is. – jcdyer Dec 08 '13 at 13:23
  • actually i was trying to figure out both, is there any problem with that ?? – James Sapam Dec 08 '13 at 13:25
  • 1
    Yes. Stack Overflow is a database of questions and answers. In the future, someone looking for the answer to a particular question should be able to search for it, and find it. To facilitate this, each post should contain a single question. When you combine multiple questions into one, it makes that more difficult. Please create separate posts for your questions. Alternatively, you could ask your questions on one of the python mailing lists. Start with python-list@python.org. http://www.python.org/community/lists/ – jcdyer Dec 08 '13 at 13:31
  • @jcdyer, Ya sorry about that, will do that next time. – James Sapam Dec 08 '13 at 13:33
  • 4
    I'd say it's no big deal. A person cannot ask 2 question if he only sees one problem. Even if the question cover multiple subject. It is still only one question and anyone that might think alike might find this answer. – Loïc Faure-Lacroix Dec 08 '13 at 13:49

3 Answers3

5

Well there is a reason why modifying a string isn't goint to modify the second one.

Strings in python are immutable.

It's not exactly that strings are cached in python but the fact is that you can't change them. The python interpreter is able to optimize somewhat and reference two names to the same id.

In python, you're never actually editing a string directly. Look at this:

a = "fun"
a.capitalize()
print a
>> fun

The capitalize function will create a capitalized version of a but won't change a. One example is str.replace. As you probably already noticed, to change a string using replace, you'll have to do something like this:

a = "fun"
a = a.replace("u", "a")
print a
>> fan

What you see here is that the name a is being affected a pointer to "fun". On the second line, we're affecting a new id to a and the old a might get removed by the gc if there is no similar string.

What you have to understand is that since strings are immutable, python can safely have strings pointing to the same id. Since the string will never get modified. You cannot have a string that will get modified implicitely.

Also, you'll see that some other types like numbers are also immutable and will the same behaviour with ids. But don't be fooled by ids, because for some reason that I can't explain.

Any number bigger than 256 will receive different ids even though they point to the same value. And if I'm not mistaken, with bigger string the ids will be different too.

Note:

The id thing might also have different values when code is being evaluated inside a repl or a program itself. I remember there is a thing with code being optimized with code blocks. Which means that executing the code on different lines might be enough to prevent optimizations.

Here's an example in the REPL:

>>> a = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'; b = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'
>>> id(a), id(b)
(4561897488, 4561897488)

>>> a = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'
>>> b = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'
>>> id(a), id(b)
(4561897416, 4561897632)

With numbers:

>>> a = 100000
>>> b = 100000
>>> id(a), id(b)
(140533800516256, 140533800516304)

>>> a = 100000; b = 100000 
>>> id(a), id(b)
(140533800516232, 140533800516232) 

But executing the file as a python script will print because it executes the lines in the same code block (as far as I understand)

4406456232 4406456232
4406456232 4406456232
140219722644160 140219722644160
Loïc Faure-Lacroix
  • 13,220
  • 6
  • 67
  • 99
  • Regarding the ID thing with numbers larger than 256: Python only stores the actual numbers from -5 to 256. [Check out the documentation here](http://docs.python.org/2/c-api/int.html#PyInt_FromLong) for a hint regarding that. Anything else, it probably makes on the fly - that might be expected, given that it can't store *every* integer. – Matthew Dec 08 '13 at 13:36
  • Actually it's not exactly true, check this out, it all depends in which context the code is being evaluated. – Loïc Faure-Lacroix Dec 08 '13 at 13:37
  • care to explain the minus one here? – Loïc Faure-Lacroix Dec 15 '13 at 14:11
2

The strings aren't cached - they're literally the same string.

See, strings are immutable in Python. Just like the number 1 is the same number 1 no matter where you write it in your code, the string "Hello" is the same string no matter where you write it in your code.

Since it's immutable, you also can't change it in-place like you would a list or somesuch - for example, if you call list.reverse(), it changes the original list, but if you call str.replace("a", "b"), it returns a new string and the old string isn't affected (this is what it means to be immutable). Because you can't ever change that string, there's no point in Python having two different copies of "Hello" when they both mean exactly the same thing and neither can ever change.

Edit - @Keeper has pointed out that there's a section of the Python FAQ detailing why strings are immutable and hence why they behave like this. Link

Matthew
  • 2,232
  • 4
  • 23
  • 37
  • 2
    Relevant question in the official python FAQ: http://docs.python.org/2/faq/design.html#why-are-python-strings-immutable – Keeper Dec 08 '13 at 12:57
  • I shall edit that link into the question! That's quite useful – Matthew Dec 08 '13 at 12:58
  • 1
    It seems this answer has been downvoted. I'd very much like to know why, if you're out there, mysterious downvoter! I'm always looking to improve my answers. – Matthew Dec 08 '13 at 13:16
  • They are the same string only when they are interned. It IS possible to have two strings that are equal to each other but not be literally the same thing. See this example: http://stackoverflow.com/questions/15541404/python-string-interning – Job Evers Oct 06 '16 at 22:27
0

String in python are not cached :)

a = 'a'
b = 'a'
id(a) == id(b) = id('a') # True because share same constant object id('a')!
a = 'z' # it change 'a' but a is not referencing 'b' so you can not change b
id(a) == id('z') # not a contains 'z' but since not related to b, b contains still 'a'!

You can do something like this to achieve what possible you like:

Thing(object): # Dummy object can store any field since it is Python
  pass

a = Thing()
a.str = 'a'
b = a 
print b.str # return 'a' since reference to object is same!

a.str = 'b'

print b.str # return 'b' since reference to object is same but value changed!
Chameleon
  • 9,722
  • 16
  • 65
  • 127