Duplicate values on Numpy array when setting dtype

Question

I have two one-dimensional arrays that we will call x and y, for convenience:

x = np.array([1., 3.])
y = np.array([2, 4])

And I want to concatenate them into a structured array. The desired output is:

array([( 1., 2), ( 3., 4)], 
      dtype=[('x', '<f8'), ('y', '<i8')])

But by doing:

my_array = np.array([x, y]).T
my_array = my_array.astype([('x', float), ('y', int)])

I get the following:

array([[( 1., 1), ( 2., 2)],
       [( 3., 3), ( 4., 4)]], 
      dtype=[('x', '<f8'), ('y', '<i8')])

https://stackoverflow.com/questions/3622850/converting-a-2d-numpy-array-to-a-structured-array shows how to construct a `recarray` with `fromrecords`. But the problem is better solved with `fromarrays`. The `x` and `y` here are fields, not records. — hpaulj, Jun 02 '17 at 21:17

score 3 · Accepted Answer · answered Jun 02 '17 at 19:42

3

You can use np.rec.fromarrays:

np.rec.fromarrays([x, y], dtype=[('x', '<f8'), ('y', '<i8')])
# rec.array([( 1., 2), ( 3., 4)], 
#           dtype=[('x', '<f8'), ('y', '<i8')])

answered Jun 02 '17 at 19:42

Psidom

209,562
33
339
356

hpaulj · Answer 2 · 2017-06-02T21:29:42.270

Converting a 2D numpy array to a structured array isn't quite a duplicate. There the starting point is

[("Hello",2.5,3),("World",3.6,2)]

The accepted solution uses np.rec.fromarrays, but it has to transpose the input. The short solution uses np.fromrecords.

But a look at the code for fromarrays suggests a simple way to do this, especially if you can't recall where all these recarray functions are hiding.

In [200]: x = np.array([1., 3.])
     ...: y = np.array([2, 4])

In [201]: dt = [('x', '<f8'), ('y', '<i8')]

In [204]: arr = np.empty(x.shape[0], dtype=dt)
In [205]: for n, v in zip(arr.dtype.names, [x, y]):
     ...:     arr[n] = v

In [206]: arr
Out[206]: 
array([( 1., 2), ( 3., 4)], 
      dtype=[('x', '<f8'), ('y', '<i8')])

Like many recfunctions, fromarrays, creates a new blank array of the desired shape and dtype, and copies values by field name.

Although the fromrecords suggests a different approach - use zip to transpose the arrays:

In [210]: list(zip(*[x,y]))
Out[210]: [(1.0, 2), (3.0, 4)]

This is a list of tuples, so I can use it directly in a structured array creation statement:

In [212]: np.array(_, dtype=dt)
Out[212]: 
array([( 1., 2), ( 3., 4)], 
      dtype=[('x', '<f8'), ('y', '<i8')])

Copying fields should be faster, assuming that the real array will have many more records than fields.

Duplicate values on Numpy array when setting dtype

2 Answers2