Say I have a large NumPy array of dtype
int32
import numpy as np
N = 1000 # (large) number of elements
a = np.random.randint(0, 100, N, dtype=np.int32)
but now I want the data to be uint32
. I could do
b = a.astype(np.uint32)
or even
b = a.astype(np.uint32, copy=False)
but in both cases b
is a copy of a
, whereas I want to simply reinterpret the data in a
as being uint32
, as to not duplicate the memory. Similarly, using np.asarray()
does not help.
What does work is
a.dtpye = np.uint32
which simply changes the dtype
without altering the data at all. Here's a striking example:
import numpy as np
a = np.array([-1, 0, 1, 2], dtype=np.int32)
print(a)
a.dtype = np.uint32
print(a) # shows "overflow", which is what I want
My questions are about the solution of simply overwriting the dtype
of the array:
- Is this legitimate? Can you point me to where this feature is documented?
- Does it in fact leave the data of the array untouched, i.e. no duplication of the data?
- What if I want two arrays
a
andb
sharing the same data, but view it as differentdtype
s? I've found the following to work, but again I'm concerned if this is really OK to do:
Though this seems to work, I find it weird that the underlyingimport numpy as np a = np.array([0, 1, 2, 3], dtype=np.int32) b = a.view(np.uint32) print(a) # [0 1 2 3] print(b) # [0 1 2 3] a[0] = -1 print(a) # [-1 1 2 3] print(b) # [4294967295 1 2 3]
data
of the two arrays does not seem to be located the same place in memory:
Actually, it seems that the above gives different results each time it is run, so I don't understand what's going on there at all.print(a.data) print(b.data)
- This can be extended to other
dtype
s, the most extreme of which is probably mixing 32 and 64 bit floats:
Again, is this condoned, if the obtained behaviour is really what I'm after?import numpy as np a = np.array([0, 1, 2, np.pi], dtype=np.float32) b = a.view(np.float64) print(a) # [0. 1. 2. 3.1415927] print(b) # [0.0078125 50.12387848] b[0] = 8 print(a) # [0. 2.5 2. 3.1415927] print(b) # [8. 50.12387848]