A structured array approach (incomplete):
Input a special library of recfunctions:
In [441]: import numpy.lib.recfunctions as rf
Define two structured arrays
In [442]: A = np.zeros((6,),[('x',int),('y',int)])
Oops, the 'xkeys in
Bare float, so for consistency, let's make the
A` ones float as well. Don't mix floats and ints unnecessarily.
In [446]: A = np.zeros((6,),[('x',float),('y',int)])
In [447]: A['x']=np.arange(6)
In [448]: A['y']=np.arange(6)
In [449]: A
Out[449]:
array([( 0., 0), ( 1., 1), ( 2., 2), ( 3., 3), ( 4., 4), ( 5., 5)],
dtype=[('x', '<f8'), ('y', '<i4')])
In [450]: B = np.zeros((6,),[('x',float),('z',float)])
In [451]: B['x']=np.linspace(.5,5.5,6)
In [452]: B['z']=np.linspace(.5,5.5,6)
In [453]: B
Out[453]:
array([( 0.5, 0.5), ( 1.5, 1.5), ( 2.5, 2.5), ( 3.5, 3.5),
( 4.5, 4.5), ( 5.5, 5.5)],
dtype=[('x', '<f8'), ('z', '<f8')])
Look at the docs of the rf.join_by
function:
In [454]: rf.join_by?
Do an outer
join:
In [457]: rf.join_by('x',A,B,'outer')
Out[457]:
masked_array(data = [(0.0, 0, --) (0.5, --, 0.5) (1.0, 1, --) (1.5, --, 1.5) (2.0, 2, --)
(2.5, --, 2.5) (3.0, 3, --) (3.5, --, 3.5) (4.0, 4, --) (4.5, --, 4.5)
(5.0, 5, --) (5.5, --, 5.5)],
mask = [(False, False, True) (False, True, False) (False, False, True)
(False, True, False) (False, False, True) (False, True, False)
(False, False, True) (False, True, False) (False, False, True)
(False, True, False) (False, False, True) (False, True, False)],
fill_value = ( 1.00000000e+20, 999999, 1.00000000e+20),
dtype = [('x', '<f8'), ('y', '<i4'), ('z', '<f8')])
The result is a masked array, with the missing values masked.
Same thing, but with masking turned off:
In [460]: rf.join_by('x',A,B,'outer',usemask=False)
Out[460]:
array([( 0. , 0, 1.00000000e+20), ( 0.5, 999999, 5.00000000e-01),
( 1. , 1, 1.00000000e+20), ( 1.5, 999999, 1.50000000e+00),
( 2. , 2, 1.00000000e+20), ( 2.5, 999999, 2.50000000e+00),
( 3. , 3, 1.00000000e+20), ( 3.5, 999999, 3.50000000e+00),
( 4. , 4, 1.00000000e+20), ( 4.5, 999999, 4.50000000e+00),
( 5. , 5, 1.00000000e+20), ( 5.5, 999999, 5.50000000e+00)],
dtype=[('x', '<f8'), ('y', '<i4'), ('z', '<f8')])
Now we see the fill values explicitly. There must be a way of replacing the 1e20
with np.nan
. Replacing 999999
with nan
is messier, since np.nan
is a float value, not integer.
Under the cover this join_by
is probably first creating a blank
array with the join
dtype
, and filling in fields one by one.