Reading Data Structures & Algorithms Chapter 5 about array based sequences I came into misunderstanding / mistake that I need to clarify.
In the book the Autor mentions that in order to implement python lists - dynamic arrays(can hold different types of elements) python uses referential arrays
so no matter what kind of data you are inserting, there is fixed length of addresses.
Then, the Author mentions that for strings for example, because we know that each elements is Unicode char, that can be represented in 64 bit, python uses compact arrays - holds the fixed length data and not references as mentioned above in regular lists.
Till here, everything fine and understood, the problem is in the section below -
Compact arrays have several advantages over referential structures in terms of computing performance. Most significantly, the overall memory usage will be much lower for a compact structure because there is no overhead devoted to the explicit storage of the sequence of memory references (in addition to the primary data). That is, a referential structure will typically use 64-bits for the memory address stored in the array, on top of whatever number of bits are used to represent the object that is considered the element. Also, each Unicode character stored in a compact array within a string typically requires 2 bytes. If each character were stored independently as a one-character string, there would be significantly more bytes used.
still, everything fine, but look at the math below -
As another case study, suppose we wish to store a sequence of one million, 64-bit integers. In theory, we might hope to use only 64 million bits. However, we estimate that a Python list will use four to five times as much memory. Each element of the list will result in a 64-bit memory address being stored in the primary array, and an int instance being stored elsewhere in memory. Python allows you to query the actual number of bytes being used for the primary storage of any object. This is done using the getsizeof function of the sys module. On our system, the size of a typical int object requires 14 bytes of memory (well beyond the 4 bytes needed for representing the actual 64-bit number). In all, the list will be using 18 bytes per entry, rather than the 4 bytes that a compact list of integers would require.
The typical int require 4 bytes, the address for 64bit system requires another 8 bytes, so how he ended up in 18 bytes per entry?
I'm adding some interesting test that I would love to understand the implementation in python under the hood if possible.