Vectorized method to insert arrays of different length at different spots in an ndarray

Question

I'm trying to "expand" a 2D numpy array nadir of points (x,y,z), and fill gaps in space with interpolated points. Where there exist spatial gaps bigger than some tolerance dist, I want to use np.insert to insert the required number of nan rows to fill that gap, adding nan points to be interpolated after.

First, I locate the gaps, and see how many points (rows) I need to insert in each gap to achieve the desired spatial point density:

import numpy as np

# find and measure gaps
nadir = nadir[~np.isnan(nadir).any(axis = 1)]
dist = np.mean(np.linalg.norm(np.diff(nadir[:,0:2], axis = 0), axis = 1), axis = 0) # mean distance for gap definition
gaps = np.where(np.linalg.norm(np.diff(nadir[:,0:2], axis = 0), axis = 1) > dist)[0] # gap locations
sizes = (np.linalg.norm(np.diff(nadir[:,0:2], axis = 0), axis = 1)[gaps] // dist).astype(int) # how many points to insert in each gap

What I wish I could do is pass a list of ndarrays to np.insert() for the values argument, like this:

nadir = np.insert(nadir, gaps, [np.full((size, 3), np.nan) for size in sizes], axis = 0)

such that at every index in gaps, the corresponding nan array of shape (size,3) gets inserted. But the above code doesn't work:

"ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (616,) + inhomogeneous part."

I could achieve this in a loop, but a nice vectorized approach would be ideal. Any ideas? Again, my final goal is to spatial fill gaps in this 3D data with interpolated values, without gridding, so any other clever approaches would also work!

I don't think you want to vectorize an *insert* operation. Every time you insert you copy data and (very likely) allocate memory. That is a lot of unnecessary redundancy if you could scan the array once to calculate the final size and then copy each contiguous sequence (once!) into it's appropriate location. — Woodford, Aug 14 '23 at 18:02
Why are you preferring this method over creating an interpolation and resampling the points with the desired spacing? — jared, Aug 14 '23 at 18:07
thanks, my first thought was to use scipy.interpolate.griddata or a similar function, but how exactly to apply that to this situation escaped me, so I resorted to finding gaps and inserting points like this. I guess I'm not sure how to generate the grid to put into griddata. I'm sure there's a more elegant scipy.interpolate solution to this then? — bt3, Aug 14 '23 at 18:22
`insert` is complicated python code. Complicated because handles many input styles, not because it is conceptually complicated. You could do the same by making a large enough target array, with the needed 'fill' data. Then copy the original data to the right rows (may need advanced indexing) — hpaulj, Aug 14 '23 at 18:40

score 1 · Accepted Answer · answered Aug 14 '23 at 18:51

1

To insert multiple rows, the number of indices has to match the number of rows:

In [176]: x=np.arange(12).reshape(4,3)

In [177]: np.insert(x,[1,1,1,2,2,3],np.ones((6,3),int)+100,axis=0)
Out[177]: 
array([[  0,   1,   2],
       [101, 101, 101],
       [101, 101, 101],
       [101, 101, 101],
       [  3,   4,   5],
       [101, 101, 101],
       [101, 101, 101],
       [  6,   7,   8],
       [101, 101, 101],
       [  9,  10,  11]])

answered Aug 14 '23 at 18:51

hpaulj

221,503
14
230
353

That's it! Thanks! I just need to reform my `gaps` and `sizes` arrays by repeating each ith index in `gaps` `sizes[i]` times! – bt3 Aug 14 '23 at 19:34

bt3 · Answer 2 · 2023-08-14T20:03:04.050

Using hpaulj's hint above, here's my final function:

def nadir_interp(nadir):
    import numpy as np

    # find and measure gaps
    nadir = nadir[~np.isnan(nadir).any(axis = 1)] # ditch nan points
    dist = np.mean(np.linalg.norm(np.diff(nadir[:,0:2], axis = 0), axis = 1), axis = 0) # mean distance for gap definition
    gaps = np.where(np.linalg.norm(np.diff(nadir[:,0:2], axis = 0), axis = 1) > dist)[0] # gap locations
    sizes = (np.linalg.norm(np.diff(nadir[:,0:2], axis = 0), axis = 1)[gaps] // dist).astype(int) # how many points to insert in each gap

    # build new list of gaps, with each ith index repeated sizes[i] number of times
    newgaps = []
    for i in range(len(gaps)):
        newgaps.extend([gaps[i]+1] * sizes[i])

    # insert nan rows
    nadir = np.insert(nadir, newgaps, np.full((len(newgaps),3), np.nan), axis = 0)

    # interp
    for i in range(3):
        nans, x= np.isnan(nadir[:,i]), lambda z: z.nonzero()[0]
        nadir[nans,i] = np.interp(x(nans), x(~nans), nadir[~nans,i])

    return nadir

This answer inspired the interpolation method here. (Note, I'm only using the x and y coordinates in nadir to search for gaps, and ignoring the z, elevation coordinate)

This turns a point cloud with gaps:

into this:

Vectorized method to insert arrays of different length at different spots in an ndarray

2 Answers2