The kinds of arrays that you really want are unclear, nor is the purpose. But talk of contiguous (or continuous) and caching, suggests that you aren't clear about how Python works.
First, Python is object oriented, all the way down. Integers, strings, lists are all objects of some class, with associated methods, and attributes. For builtin classes we have little say about the storage.
Let's make a small list:
In [89]: alist = [1,2,3,1000,1001,1000,'foobar']
In [90]: alist
Out[90]: [1, 2, 3, 1000, 1001, 1000, 'foobar']
A list has a data buffer that stores references (pointers if you will) to objects else where in memory. The id
may give some idea of where, it shouldn't be understood as a 'pointer' in the c
language sense.
For this list:
In [91]: [id(i) for i in alist]
Out[91]:
[9784896,
9784928,
9784960,
140300786887792,
140300786888080,
140300786887792,
140300786115632]
1,2,3 have small id values because Python has initialized small integers (up to 256) at the start. So all uses will have that unique id.
In [92]: id(2)
Out[92]: 9784928
Within the list creation 1000
appears to be unique, but not so outside of that context.
In [93]: id(1001)
Out[93]: 140300786888592
Looks like the string is cached as well - but that's just the interpreter's choice, and we shouldn't count on it.
In [94]: id('foobar')
Out[94]: 140300786115632
The reverse list is a new list, with its own pointer array. But the references are same:
In [95]: rlist = alist[::-1]
In [96]: rlist
Out[96]: ['foobar', 1000, 1001, 1000, 3, 2, 1]
In [97]: rlist[5],id(rlist[5])
Out[97]: (2, 9784928)
Indexing actions like [::-1]
should just depend on the number of items in the list. It doesn't depend on where the value actually point to. Same for other copies. Even appending to the array is relatively time independent (it maintains growth space in the data buffer). Actually working with the objects in the list may be depend on where they are stored in memory, but we have little say about that.
A "2d" list is actually a list with list elements; nested lists. The sublists are stored else where in memory, just like strings and numbers. In that sense the nested lists are not contiguous.
So what about arrays?
In [101]: x = np.arange(12)
In [102]: x
Out[102]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [104]: x.__array_interface__
Out[104]:
{'data': (57148880, False),
'strides': None, # default (8,)
'descr': [('', '<i8')],
'typestr': '<i8',
'shape': (12,),
'version': 3}
In [105]: x.nbytes # 12*8 bytes
Out[105]: 96
x
is a ndarray
object, with attributes like shape
, strides
and dtype
. And a data buffer. In this case is a c
array 96 bytes long, at "57148880. We can't use that number, but I find it useful when comparing this
array_interfacedict across arrays. A
view` in particular will have the same, or related value.
In [106]: x.reshape(3,4)
Out[106]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [107]: x.reshape(3,4).__array_interface__['data']
Out[107]: (57148880, False)
In [108]: x.reshape(3,4)[1,:].__array_interface__['data']
Out[108]: (57148912, False) # 32 bytes later
The array data buffer has actual values, not references. Here with int
dtype, each 8 bytes is interpreted as a 'int64' value.
Your id
iteration effectively asks for a list, [x[i] for i in range(n)]
. An element of an array has to be "unboxed", and is a new object, type np.int64
. While not an array, it does have a lot of properties in common with a 1 element array.
In [110]: x[4].__array_interface__
Out[110]:
{'data': (57106480, False),
...
'shape': (),....}
That data
value is unrelated to x
's.
As long as you use numpy
methods on existing arrays, speeds are good, often 10x better than equivalent list methods. But if you start with a list, it takes time to make an array. And treating the array like list is slow.
And the reverse of x
?
In [111]: x[::-1].__array_interface__
Out[111]:
{'data': (57148968, False),
'strides': (-8,),
'descr': [('', '<i8')],
'typestr': '<i8',
'shape': (12,),
'version': 3}
It's a new array, but with a different strides
(-8,), and data
points to the end of the buffer, 880+96-8
.