1

I have an array that I read from an HDF5 file, and it is a 1D array of tuples. Its dtype is:

[('cycle', '<u2'), ('dxn', 'i1'), ('i (mA)', '<f4'), ('V', '<f4'), ('R(Ohm)', '<f4')] 

I would like to convert this from an n x 1 array into a (n/5) x 5 array of type np.float.

I tried np.astype but that does not work--it returns only n elements. Any easy way to do this?

germ
  • 1,477
  • 1
  • 18
  • 18
  • It is a structured array http://docs.scipy.org/doc/numpy-1.10.1/user/basics.rec.html you can't upcaste unicode (cycle) to float, but you have 3 float fields ( –  Jun 28 '16 at 18:43
  • @DanPatterson FYI: The `u2` dtype is a 16 bit unsigned integer (numpy.uint16), not unicode. – Warren Weckesser Jun 28 '16 at 19:09
  • 1
    This has been asked and answered before: http://stackoverflow.com/questions/5957380/convert-structured-array-to-regular-numpy-array – Alicia Garcia-Raboso Jun 28 '16 at 20:10
  • @WarrenWeckesser misread rec.array(('Ah', 1, 'goofed'), dtype=[('f0', ' –  Jun 28 '16 at 23:47
  • Guys, thanks a lot. I learned about _structured arrays_ and I now realize that I can handle those just fine in `numpy`. So I don't plan to convert them anymore. – germ Jun 30 '16 at 04:24

1 Answers1

-1

The mix of dtypes makes this conversion trickier than usual. The answer at the end, copying fields to a target array has the combination of speed and generality.

Convert structured array to regular NumPy array - was suggested as a duplicate, but that case has all float fields.

Let's construct a sample:

In [850]: dt
Out[850]: dtype([('cycle', '<u2'), ('dxn', 'i1'), ('i (mA)', '<f4'), ('V', '<f4'), ('R(Ohm)', '<f4')])

In [851]: x=np.zeros((3,),dt)
In [852]: x['cycle']=[0,10,23]
In [853]: x['dxn']=[3,2,2]
In [854]: x['V']=[1,1,1]

In [855]: x
Out[855]: 
array([(0, 3, 0.0, 1.0, 0.0), (10, 2, 0.0, 1.0, 0.0),
       (23, 2, 0.0, 1.0, 0.0)], 
      dtype=[('cycle', '<u2'), ('dxn', 'i1'), ('i (mA)', '<f4'), ('V', '<f4'), ('R(Ohm)', '<f4')])

We can view the 3 float fields in ways suggested in that link:

In [856]: dt1=np.dtype([('f0','float32',(3))])

In [857]: y=x[list(x.dtype.names[2:])].view(dt1)
# or x[list(x.dtype.names[2:])].view((np.float32, 3))

In [858]: y
Out[858]: 
array([([0.0, 1.0, 0.0],), ([0.0, 1.0, 0.0],), ([0.0, 1.0, 0.0],)], 
      dtype=[('f0', '<f4', (3,))])

In [859]: y['f0']
Out[859]: 
array([[ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.]], dtype=float32)

But I need to make y a copy if I want to change all the values. Writing to multiple fields at a time is not allowed.

In [863]: y=x[list(x.dtype.names[2:])].view(dt1).copy()
In [864]: y['f0']=np.arange(9.).reshape(3,3)

view with one dtype does not capture the row structure; we have to add that back with reshape. dt1 with a (3,) shape gets around that issue.

In [867]: x[list(x.dtype.names[2:])].view(np.float32)
Out[867]: array([ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.], dtype=float32)

https://stackoverflow.com/a/5957455/901925 suggests going through a list.

In [868]: x.tolist()
Out[868]: [(0, 3, 0.0, 1.0, 0.0), (10, 2, 0.0, 1.0, 0.0), (23, 2, 0.0, 1.0, 0.0)]

In [869]: np.array(x.tolist())
Out[869]: 
array([[  0.,   3.,   0.,   1.,   0.],
       [ 10.,   2.,   0.,   1.,   0.],
       [ 23.,   2.,   0.,   1.,   0.]])

Individual fields can be converted with astype:

In [878]: x['cycle'].astype(np.float32)
Out[878]: array([  0.,  10.,  23.], dtype=float32)

In [879]: x['dxn'].astype(np.float32)
Out[879]: array([ 3.,  2.,  2.], dtype=float32)

but not multiple fields:

In [880]: x.astype(np.float32)
Out[880]: array([  0.,  10.,  23.], dtype=float32)

recfunctions help manipulated structured arrays (and recarrays)

from numpy.lib import recfunctions

Many of them construct a new empty structure, and copy values field by field. The equivalent in this case:

In [890]: z=np.zeros((3,5),np.float32)    
In [891]: for i in range(5):
   .....:     z[:,i] = x[x.dtype.names[i]]

In [892]: z
Out[892]: 
array([[  0.,   3.,   0.,   1.,   0.],
       [ 10.,   2.,   0.,   1.,   0.],
       [ 23.,   2.,   0.,   1.,   0.]], dtype=float32)

In this small case it is a bit slower than np.array(x.tolist()). But for 30000 records this is much faster.

Usually there are many more records than fields in a structured array, so iteration on fields is not slow.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • many thanks for taking the time to compose your elaborate answer. The last block of code in your answer is the one I understand. It makes perfect sense to me and is what I will use should I have the need for it in the future. You are right, I have several thousand records (but only 5 columns) in my data sets. – germ Jun 30 '16 at 04:27