First, we should start with that Python doesn't work the same way as C. In C, an array is just a block of memory. In Python, it's an object. The id()
of c
and c[0]
are not the same as a result of that.
Second, you should realize that every thing in Python is an object. In C, when you would do something like c[0]
, you are requesting the first value from a sequence of memory locations. In Python, this is not necessarily the case. For a standard list, it is backed by an array, but the address of that is hidden from you. What you see is the address of the object via id()
. In this case, c
is a string, but so is c[0]
(there are no character types in Python). This means when you ask for c[0]
, Python is creating a new string to represent the character (or rather, sub-string) you requested. Fortunately, Python doesn't actually create a new string every time as Python interns 1-character strings automatically.
Also, keep in mind, Python objects have a structure and that consumes memory too. One of the best things about C is the ability to be very much in control of memory layout, but you lose that aspect in Python. The flip-side is that you don't have to do manual allocations and freeing of memory, which is a relief (I do a lot of C and Python programming, so I see the benefit).
Third, there is a lot more memory allocation and freeing happening in Python. Depending on how Python is built, and the underlying operating system strategy for allocating memory, any number of things could be happening to cause the address to not increase sequentially. But since everything is an object, there is an underlying allocation taking place for everything.
I had this doubt because I learnt C++ first (to be honest, Turbo C++) and the way strings' addresses are defined in Python is very different from what happens in C++. I guess this is okay in Python as we cannot access an object via it's address in Python, am I right?
Yes, and no. When you say c[0]
, underneath the hood a special method is being run to retrieve the substring from the string. This is different than what you get in C++. However, Python does efficiently store the string under the hood as a sequence of bytes. So just because you can't see that efficiency examining addresses, it doesn't mean it's not there. Also, as I mentioned above, c[0]
returns a new string that represents the substring that you desire. Python is clever here and will return a 1-character string, but it will be an intern'd string. You can see that some letters have the same address:
>>> for c in "hobo":
... print c, id(c)
...
h 4434994600
o 4434861432
b 4434859712
o 4434861432
You can see that the string for "o"
shares the same address--BTW, the example is Python 2, but the same quality exists in Python 3.
And you are correct, you cannot access the object by it's address--at least that's not a feature of the language. How ids are generated is an implementation detail, but you should count on every Python interpreter doing it this way.
Also, what is the point in having different addresses for c and c[0]? These questions may be unnecessary for some, but I am too curious to know how Python allocates addresses to various datatypes, specially (here) strings.
I explained this above, but to recap: c
and c[0]
are different than in C. In Python, the first is the string and the second is requesting a substring containing the first character of the string.
Python does use an arena-style memory management scheme in many areas, but for the most part you don't need to care about that. If you're curious, I suggest you take a look at the Python source code. The Python
subdirectory has many of the language and low-level runtime support bits. And also realize that Python pre-caches some things too, which can also explain the discrepancy in addresses that you see above.