Creating a np.void object of mixed data type, to use in np.full

Question

I want to make an array that is filled with a particular value. For simple data types, I can do this with np.full. For example, the following code will generate an array of length 10, where each value is the 64-bit integer 10:

import numpy as np
arr = np.full((10,), -1, np.int64)

But I have more complicated, mixed, array data types. For example, I'd expect the following code to work:

import numpy as np
data_type = [("value_1", np.int64), ("value_2", np.float64)]
default = (-1, np.nan)
arr = np.full((10,), default, data_type)

This gives ValueError: could not broadcast input array from shape (2) into shape (10). I know why (it tries putting each value of my default into a separate element of my array) - it just isn't what I want it to do (putting my entire default into each element of the array.

I'd be able to get around this by making my default something that numpy recognizes to be a single element. For example, this works:

default_array = np.array([default], data_type)
new_default = default_array[0]
arr = np.full((10,), new_default, data_type)

But this is sure to confuse any future readers of my code, myself included.

Now on to my actual question: Is there any way to make this new_default object without going through the hoop of first creating an array?

The new_default object is of type numpy.void, but I can't seem to create my own such object through, e.g. np.void(default).

look at the python code `np.full` – hpaulj Apr 25 '19 at 10:35 — hpaulj, Apr 25 '19 at 10:35

hpaulj · Answer 1 · 2019-04-25T16:44:11.200

The short answer - don't use np.full to construct a structured array. Make the blank array, and assign the value with arr[:] = default_tuple.

It's the copyto that's have problems broadcasting the default:

In [596]: np.full(3,default)                                                         
---------------------------------------------------------------------------
/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in full(shape, fill_value, dtype, order)
    334         dtype = array(fill_value).dtype
    335     a = empty(shape, dtype, order)
--> 336     multiarray.copyto(a, fill_value, casting='unsafe')
    337     return a
    338 

ValueError: could not broadcast input array from shape (2) into shape (3)

Evidently copyto takes the fill_value, converts it to an array (input array from shape (2)) , and attempts to broadcast it to the target.

In [599]: np.array(default)                                                          
Out[599]: array([-1., nan])

But indexed assignment takes the tuple default just fine:

In [589]: arr = np.zeros(3, dtype=data_type)                                         
In [590]: arr[:] = default                                                           
In [591]: arr                                                                        
Out[591]: 
array([(-1, nan), (-1, nan), (-1, nan)],
      dtype=[('value_1', '<i8'), ('value_2', '<f8')])

This passes the tuple to arr unchanged. Default data input to a structured array is a list of tuples,

In [600]: np.array([default,default,default], dtype=data_type)                       
Out[600]: 
array([(-1, nan), (-1, nan), (-1, nan)],
      dtype=[('value_1', '<i8'), ('value_2', '<f8')])

Yes, the type of an element of a structured array is np.void, but as far as I know, np.void can't be used as an object constructor. There's no such documentation, and my experiments have failed.

But then, np.float32(23), while it works, isn't good numpy practice. I see it more often in beginners code than in numpy functions.

To me, making a single element structured array is perfectly normal:

In [573]: x=np.array(default, dtype=data_type)                                       
In [574]: x                                                                          
Out[574]: array((-1, nan), dtype=[('value_1', '<i8'), ('value_2', '<f8')])
In [575]: type(x[()])                                                                
Out[575]: numpy.void
In [576]: x.item()                                                                   
Out[576]: (-1, nan)

In [577]: np.full(3, x)      # full can deduce dtype from fill_value                                                            
Out[577]: 
array([(-1, nan), (-1, nan), (-1, nan)],
      dtype=[('value_1', '<i8'), ('value_2', '<f8')])

Concatenation requires the same thing - creating an object of matching dtype

In [583]: np.hstack((arr,default))                                                   
---------------------------------------------------------------------------
TypeError: invalid type promotion

In [584]: np.hstack((arr,x))                                                         
Out[584]: 
array([(-1, nan), (-1, nan), (-1, nan), (-1, nan)],
      dtype=[('value_1', '<i8'), ('value_2', '<f8')])

score 0 · Answer 2 · answered Apr 25 '19 at 12:05

I don't think that np.full can handle more than one default; the documentation states: "fill_value : scalar". I think you're better off by creating two separate arrays and merging them afterwards.

You can, however, pass the various dtypes, to get at least that part in one go, e.g.:

arr=np.full(10,-1,'|S4, (2,1)i4, f8')

or

dts=np.dtype([('f1', np.int64), ('f2', np.float64)])
arr=np.full(10,-1,dts)

See here for the relevant documentation, also have a look at this general discussion on performance.

Creating a np.void object of mixed data type, to use in np.full

2 Answers2