Why the addresses of different datatypes different [Python]?

Question

>>> a=5
>>> b=6
>>> id(a)
10914496
>>> id(b)
10914528
>>> c='Hello'
>>> d='World'
>>> id(c)
139973573252184
>>> id(d)
139973616356744
>>> e=(4>5)
>>> f=(4<5)
>>> id(e)
10739968
>>> id(f)
10740000

Why the length of address of string and boolean/int datatype so different?
Why the subsequent declaration have considerable difference in their addresses, in comparision with the size of the datatype?

Update#1

>>> id(c)
139973616356856
>>> id(c[0])
139973652926112
>>> id(c[1])
139973653190728
>>> id(c[2])
139973653634272
>>> id(c[3])
139973653302104

I had this doubt because I learnt C++ first (to be honest, Turbo C++) and the way strings' addresses are defined in Python is very different from what happens in C++. I guess this is okay in Python as we cannot access an object via it's address in Python, am I right?

Also, what is the point in having different addresses for c and c[0]? These questions may be unnecessary for some, but I am too curious to know how Python allocates addresses to various datatypes, specially (here) strings.

Also explain the downvote as this is my first ever question on this site. — MrObjectOriented, Aug 26 '17 at 12:52

pointerless · Answer 1 · 2017-08-26T13:15:38.277

Dependent upon your computer's architecture, datatypes will be stored in different byte lengths in memory. For example each ASCII character in a string would require a single byte to store it, while integers can be stored in any bit length up to a limit depending on the size of the number being stored. I'm not totally sure but python may store different datatypes in different areas of it's allocated memory also.

Python also stores a lot more in it's allocated memory than just the variables you give it. The IDE is running in that area too. So between two allocations some other variable may have been stored.

For update #1, take a look at this

I get your point. Thank you. Also do have a look at my update#1. — MrObjectOriented, Aug 26 '17 at 13:13

score 1 · Answer 2 · answered Aug 26 '17 at 13:53

That id happens to be addresses in CPython is an implementation detail; they're only guaranteed to be distinct for objects that exist at the same time.

The grouping you've observed is because CPython precreates a number of objects, including -5 through 256 as well as True and False. Under ordinary circumstances, those values won't appear at any other addresses, something made possible because they are of immutable types.

The second question, about slices out of a string, is because Python's string objects don't refer to each other. There is no character type, so extracting a character from a string produces a new string. Again, some of those may be cached (interned strings). The address of the string object is not necessarily the address of its contents.

The C types you are familiar with can be accessed with ctypes, but doing so is typically awkward and risky. For instance, if you pass a Python string to a function that alters a C string, you break the string itself; Python expects strings to be immutable, and may share them and cache their hashes.

score 1 · Accepted Answer · answered Aug 26 '17 at 13:59

First, we should start with that Python doesn't work the same way as C. In C, an array is just a block of memory. In Python, it's an object. The id() of c and c[0] are not the same as a result of that.

Second, you should realize that every thing in Python is an object. In C, when you would do something like c[0], you are requesting the first value from a sequence of memory locations. In Python, this is not necessarily the case. For a standard list, it is backed by an array, but the address of that is hidden from you. What you see is the address of the object via id(). In this case, c is a string, but so is c[0] (there are no character types in Python). This means when you ask for c[0], Python is creating a new string to represent the character (or rather, sub-string) you requested. Fortunately, Python doesn't actually create a new string every time as Python interns 1-character strings automatically.

Also, keep in mind, Python objects have a structure and that consumes memory too. One of the best things about C is the ability to be very much in control of memory layout, but you lose that aspect in Python. The flip-side is that you don't have to do manual allocations and freeing of memory, which is a relief (I do a lot of C and Python programming, so I see the benefit).

Third, there is a lot more memory allocation and freeing happening in Python. Depending on how Python is built, and the underlying operating system strategy for allocating memory, any number of things could be happening to cause the address to not increase sequentially. But since everything is an object, there is an underlying allocation taking place for everything.

I had this doubt because I learnt C++ first (to be honest, Turbo C++) and the way strings' addresses are defined in Python is very different from what happens in C++. I guess this is okay in Python as we cannot access an object via it's address in Python, am I right?

Yes, and no. When you say c[0], underneath the hood a special method is being run to retrieve the substring from the string. This is different than what you get in C++. However, Python does efficiently store the string under the hood as a sequence of bytes. So just because you can't see that efficiency examining addresses, it doesn't mean it's not there. Also, as I mentioned above, c[0] returns a new string that represents the substring that you desire. Python is clever here and will return a 1-character string, but it will be an intern'd string. You can see that some letters have the same address:

>>> for c in "hobo":
...     print c, id(c)
...
h 4434994600
o 4434861432
b 4434859712
o 4434861432

You can see that the string for "o" shares the same address--BTW, the example is Python 2, but the same quality exists in Python 3.

And you are correct, you cannot access the object by it's address--at least that's not a feature of the language. How ids are generated is an implementation detail, but you should count on every Python interpreter doing it this way.

Also, what is the point in having different addresses for c and c[0]? These questions may be unnecessary for some, but I am too curious to know how Python allocates addresses to various datatypes, specially (here) strings.

I explained this above, but to recap: c and c[0] are different than in C. In Python, the first is the string and the second is requesting a substring containing the first character of the string.

Python does use an arena-style memory management scheme in many areas, but for the most part you don't need to care about that. If you're curious, I suggest you take a look at the Python source code. The Python subdirectory has many of the language and low-level runtime support bits. And also realize that Python pre-caches some things too, which can also explain the discrepancy in addresses that you see above.

Why the addresses of different datatypes different [Python]?

3 Answers3