How to change numpy array dtype and reshape?

Question

I have an array that I read from an HDF5 file, and it is a 1D array of tuples. Its dtype is:

[('cycle', '<u2'), ('dxn', 'i1'), ('i (mA)', '<f4'), ('V', '<f4'), ('R(Ohm)', '<f4')]

I would like to convert this from an n x 1 array into a (n/5) x 5 array of type np.float.

I tried np.astype but that does not work--it returns only n elements. Any easy way to do this?

It is a structured array http://docs.scipy.org/doc/numpy-1.10.1/user/basics.rec.html you can't upcaste unicode (cycle) to float, but you have 3 float fields ( — , Jun 28 '16 at 18:43
@DanPatterson FYI: The `u2` dtype is a 16 bit unsigned integer (numpy.uint16), not unicode. — Warren Weckesser, Jun 28 '16 at 19:09
This has been asked and answered before: http://stackoverflow.com/questions/5957380/convert-structured-array-to-regular-numpy-array — Alicia Garcia-Raboso, Jun 28 '16 at 20:10
@WarrenWeckesser misread rec.array(('Ah', 1, 'goofed'), dtype=[('f0', ' — , Jun 28 '16 at 23:47
Guys, thanks a lot. I learned about _structured arrays_ and I now realize that I can handle those just fine in `numpy`. So I don't plan to convert them anymore. — germ, Jun 30 '16 at 04:24

score -1 · Answer 1 · edited May 23 '17 at 12:16

The mix of dtypes makes this conversion trickier than usual. The answer at the end, copying fields to a target array has the combination of speed and generality.

Convert structured array to regular NumPy array - was suggested as a duplicate, but that case has all float fields.

Let's construct a sample:

In [850]: dt
Out[850]: dtype([('cycle', '<u2'), ('dxn', 'i1'), ('i (mA)', '<f4'), ('V', '<f4'), ('R(Ohm)', '<f4')])

In [851]: x=np.zeros((3,),dt)
In [852]: x['cycle']=[0,10,23]
In [853]: x['dxn']=[3,2,2]
In [854]: x['V']=[1,1,1]

In [855]: x
Out[855]: 
array([(0, 3, 0.0, 1.0, 0.0), (10, 2, 0.0, 1.0, 0.0),
       (23, 2, 0.0, 1.0, 0.0)], 
      dtype=[('cycle', '<u2'), ('dxn', 'i1'), ('i (mA)', '<f4'), ('V', '<f4'), ('R(Ohm)', '<f4')])

We can view the 3 float fields in ways suggested in that link:

In [856]: dt1=np.dtype([('f0','float32',(3))])

In [857]: y=x[list(x.dtype.names[2:])].view(dt1)
# or x[list(x.dtype.names[2:])].view((np.float32, 3))

In [858]: y
Out[858]: 
array([([0.0, 1.0, 0.0],), ([0.0, 1.0, 0.0],), ([0.0, 1.0, 0.0],)], 
      dtype=[('f0', '<f4', (3,))])

In [859]: y['f0']
Out[859]: 
array([[ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.]], dtype=float32)

But I need to make y a copy if I want to change all the values. Writing to multiple fields at a time is not allowed.

In [863]: y=x[list(x.dtype.names[2:])].view(dt1).copy()
In [864]: y['f0']=np.arange(9.).reshape(3,3)

view with one dtype does not capture the row structure; we have to add that back with reshape. dt1 with a (3,) shape gets around that issue.

In [867]: x[list(x.dtype.names[2:])].view(np.float32)
Out[867]: array([ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.], dtype=float32)

https://stackoverflow.com/a/5957455/901925 suggests going through a list.

In [868]: x.tolist()
Out[868]: [(0, 3, 0.0, 1.0, 0.0), (10, 2, 0.0, 1.0, 0.0), (23, 2, 0.0, 1.0, 0.0)]

In [869]: np.array(x.tolist())
Out[869]: 
array([[  0.,   3.,   0.,   1.,   0.],
       [ 10.,   2.,   0.,   1.,   0.],
       [ 23.,   2.,   0.,   1.,   0.]])

Individual fields can be converted with astype:

In [878]: x['cycle'].astype(np.float32)
Out[878]: array([  0.,  10.,  23.], dtype=float32)

In [879]: x['dxn'].astype(np.float32)
Out[879]: array([ 3.,  2.,  2.], dtype=float32)

but not multiple fields:

In [880]: x.astype(np.float32)
Out[880]: array([  0.,  10.,  23.], dtype=float32)

recfunctions help manipulated structured arrays (and recarrays)

from numpy.lib import recfunctions

Many of them construct a new empty structure, and copy values field by field. The equivalent in this case:

In [890]: z=np.zeros((3,5),np.float32)    
In [891]: for i in range(5):
   .....:     z[:,i] = x[x.dtype.names[i]]

In [892]: z
Out[892]: 
array([[  0.,   3.,   0.,   1.,   0.],
       [ 10.,   2.,   0.,   1.,   0.],
       [ 23.,   2.,   0.,   1.,   0.]], dtype=float32)

In this small case it is a bit slower than np.array(x.tolist()). But for 30000 records this is much faster.

Usually there are many more records than fields in a structured array, so iteration on fields is not slow.

many thanks for taking the time to compose your elaborate answer. The last block of code in your answer is the one I understand. It makes perfect sense to me and is what I will use should I have the need for it in the future. You are right, I have several thousand records (but only 5 columns) in my data sets. — germ, Jun 30 '16 at 04:27

How to change numpy array dtype and reshape?

1 Answers1

Linked