x
and x.data
are different types though they are interpreting data from the same location in memory
In [1]: import numpy as np
In [2]: x = np.array([1,2])
In [3]: type(x)
Out[3]: numpy.ndarray
In [4]: type(x.data)
Out[4]: buffer
x.data
is a pointer to the underlying buffer of bytes that composes the array object in memory, referenced here in the numpy
docs.
When we check the underlying datatype (dtype
) the array is storing the data as we see the following:
In [5]: x.dtype
Out[5]: dtype('int64')
An int64
is composed of 64 bits or 8 bytes (8 bits in a byte). This means the underlying buffer of x
, x.data
should be a buffer
of length 16. We confirm that here:
In [6]: len(x.data)
Out[6]: 16
Lastly, we can peek into the actual values of the buffer to see how Python is storing the values in memory:
In [7]: for i in range(len(x.data)): print ord(x.data[i])
1
0
0
0
0
0
0
0
# first 8 bytes above, second 8 below
2
0
0
0
0
0
0
0
We use ord
to return the value of the byte since numpy
is storing the value as an 8 bit (1 byte) string.
Since, each of these bytes only stores 8 bits of information, none of the above values printed by the loop will never exceed 255, the maximum value of a byte.
The link between x
and x.data
is that x.data
points to the location in memory of the values you see when you inspect x
. numpy
uses the ndarray
type as an abstraction on top of this lower level storage in memory to make it easy to deal with arrays at a high level, like getting the value of x
at index one:
In [8]: x[1]
Out[8]: 2
instead of needing to implement the correct offsetting and binary to integer conversion yourself.