4

I have an n-dimensional grid of points but there are holes in it and I want to get a list of the grid points that are missing. However, I don't want to expand the borders of the existing grid.

E.g. in 2D, I only want the grid point coordinates if there are any values above and below OR left and right. Here's a cartoon where o is an existing point and x is the coordinates I want.

    o o o o o 
o o x o o x o
o x x o o
  o x o o
  o o o o

The data is not in a grid though. It's just a list of the coordinates, i.e.

coords = [(1000,3.5), (1000,4.0), (1000,4.5), (1000,5.0), (1000,5.5), 
(1100,4.5), (1100,5.5), (1200,4.0), (1200,4.5), (1200,5.0), (1200,5.5), 
(1300,3.5), (1300,4.0), (1300,4.5)]

So the values I want are [(1100,3.5), (1100,4.0), (1100,5.0), (1200,3.5)].

I tried getting the min and max of each parameter and making a new axis numpy.arange(min(param1),max(param1),100), then comparing it to the old values via numpy.setdiff1d() but this makes the grid rectangular when it isn't necessarily.

Any ideas on how to efficiently do this?

Joe Flip
  • 1,076
  • 4
  • 21
  • 37
  • 1
    Efficient in `numpy` usually means making things rectangular. If you have lists of varying length it's pretty good sign that you can't 'vectorize' - operate on all values at once. I'd try mapping the points. on to a rectangular grid, and then worry about distinguishing between 'inside' and 'outside' undefined points. Get it working first, then worry about efficiency. – hpaulj Aug 15 '15 at 03:21
  • Ok, @hpaulj I can try that. Any ideas on how to do this with itertools or very inefficiently, like with list comprehensions? – Joe Flip Aug 17 '15 at 19:06
  • How does `(1200,3.5)` feature in the output, as it doesn't seem to satisfy "above and below OR left and right"? – Divakar Aug 22 '15 at 05:44
  • @Divakar, I believe `(1200, 3.5)` is between these points `[(1000, 3.5), (1300,3.5)]`. If generalized to n dimensions, I believe "if there are any values above and below OR left and right" means "between two existing points along at least one dimension". See for example the second `x` in the second row of the cartoon. – askewchan Aug 23 '15 at 22:46
  • @askewchan Hmm now it makes sense! Thanks for pointing that one out. I was assuming the immediate up and down ones. – Divakar Aug 24 '15 at 05:32
  • Just comming to my mind right now: That looks to be a binary erosion/dilation problem. For every dimension you have two kernels for the corresponding axis. You do the binary dilation of the grid with these two kernels and the difference is your answer. Simple as that. – Dschoni Aug 26 '15 at 13:03
  • If N-dimensional-cube like, have a look at scipy.ndimage.morphology.binary_fill_holes to make the ultimate binary closing and do the dilation only for the border areas. – Dschoni Aug 26 '15 at 13:05
  • @Dschoni can you provide an example? I'm unfamiliar with that method. – Joe Flip Aug 26 '15 at 15:07

3 Answers3

3

Dunno about speed, but here is one that works for D dimensions, which is somewhat inspired by existing comments and answers. For a square-ish set of points of width w, it loops about D*w**(D-1) times. It loops over each dimension, looks at a projection along that dimension and loops over all lines along the dimension in that projection, doing the setdiff along each line.

import numpy as np

def grid_holes(coords):
    coords = np.atleast_2d(coords)
    N, D = coords.shape
    coords = coords[np.lexsort(coords.T)]
    diff = np.diff(coords, axis=0)
    spacing = np.where(diff, np.abs(diff), np.inf).min(0)

    missing = []
    for d in xrange(D):
        projection = np.delete(coords, d, 1)
        order = np.lexsort(projection.T)
        gridlines = np.split(coords[order],
                np.diff(projection[order], axis=0).any(1).nonzero()[0] + 1)
        for gridline in gridlines:
            x = gridline[:, d]
            s = spacing[d]
            i = np.round(x/s).astype(int)
            gaps = np.diff(i) - 1
            gap_locs = gaps.nonzero()[0]
            if not len(gap_locs):
                continue
            mx = [ x[loc] + s*(g+1) for loc in gap_locs
                                    for g in xrange(gaps[loc])]
            mcoords = np.repeat(gridline[:1], len(mx), 0)
            mcoords[:, d] = mx
            missing.append(mcoords)
    return np.concatenate(missing)

A function to test it:

def test_grid_holes(coords, known_holes=None, func=grid_holes):
    ret = ()
    if isinstance(coords, tuple) and len(coords)==2:
        # Generate random coords
        N, D = coords
        coords = np.random.randint(0, int(N**(1./D)), coords)
        ret += (coords, )
    else:
        coords = np.atleast_2d(coords)
        N, D = coords.shape
    found_holes = func(coords)
    found_holes = np.unique(found_holes.view('f8,'*D)).view('f8').reshape(-1, D)
    ret += (found_holes,)
    if D <= 3:
        import matplotlib.pyplot as plt
        fig = plt.figure()
        if D == 2:
            ax = fig.add_subplot(111)
        elif D == 3:
            from mpl_toolkits.mplot3d import Axes3D
            ax = fig.add_subplot(111, projection='3d')
        if known_holes is not None:
            known_holes = np.atleast_2d(known_holes)
            ax.scatter(*known_holes.T, c='r', marker='o')
        ax.scatter(*coords.T, c='k', marker='o')
        ax.scatter(*found_holes.T, c='k', marker='x')

    if known_holes is not None:
        known_holes = np.unique(known_holes.view('f8,'*D)).view('f8').reshape(-1, D)
        return np.allclose(found_holes, known_holes)
    else:
        return ret

Here we can test it for your data and generated data:

coords = [(1000,3.5), (1000,4.0), (1000,4.5), (1000,5.0), (1000,5.5),
          (1100,4.5), (1100,5.5), (1200,4.0), (1200,4.5), (1200,5.0),
          (1200,5.5), (1300,3.5), (1300,4.0), (1300,4.5)]
holes = [(1100,3.5), (1100,4.0), (1100,5.0), (1200,3.5)]

test_grid_holes(coords, holes)

test_grid_holes((100, 3))
askewchan
  • 45,161
  • 17
  • 118
  • 134
  • This works but returns duplicate holes, e.g. [1100,5.0]. Thanks! – Joe Flip Sep 01 '15 at 03:16
  • Yeah, if it's between points along more than one direction it will get counted each time, hence the call to `np.unique` in the test function. Honestly @moarningsun's answer is better, more elegant, and about twice as fast. – askewchan Sep 01 '15 at 03:18
3

I think the easiest way is to map the grid to a rectangular array. Because then it's relatively simple and fast to determine which points fall inside the criteria. The downside is that RAM usage might eventually become a problem, especially for sparse grids.

One point still open to debate is how the grid should be defined. The other answers currently use the minimum difference along a dimension between elements as the step size of the grid in that direction. However, this poses problems in rare cases. E.g. if the known coordinates are:

2, 4, 6, 9, 11

Then the step size would be taken equal to 2, but obviously this goes wrong at 9. Maybe it's best to take the greatest common divisor of consecutive differences? E.g. with help of this answer. In my code I've taken a different approach: Only the "ticks" present in the known coordinates are used to construct the grid.

For the 2D case something like the following could suffice:

def find_holes_2d(coords):
    coords = np.asanyarray(coords)

    # determine grid and transform coordinates
    uniq_x, labels_x = np.unique(coords[:,0], return_inverse=True)
    uniq_y, labels_y = np.unique(coords[:,1], return_inverse=True)

    # layout the known grid in an array
    grid = np.zeros([len(uniq_x), len(uniq_y)], bool)
    grid[labels_x, labels_y] = True

    # see which grid points are inside known coordinates
    x_fwd  = np.logical_or.accumulate(grid, axis=0)
    x_bkwd = np.logical_or.accumulate(grid[::-1], axis=0)[::-1]
    y_fwd  = np.logical_or.accumulate(grid, axis=1)
    y_bkwd = np.logical_or.accumulate(grid[:,::-1], axis=1)[:,::-1]

    # select the holes according to the criteria
    holes = ~grid & (x_fwd & x_bkwd | y_fwd & y_bkwd)

    # Transform positions back to original coordinates
    I,J = np.where(holes)
    return np.column_stack([uniq_x[I], uniq_y[J]])

The same approach can applied to the ND case, for example:

def find_holes(coords):
    coords = np.asanyarray(coords)

    uniq, labels = zip(*[np.unique(c, return_inverse=True) for c in coords.T])

    grid = np.zeros(map(len, uniq), bool)
    grid[labels] = True

    candidates = np.zeros_like(grid)
    for dim in range(grid.ndim):
        grid0 = np.rollaxis(grid, dim)
        inside = np.logical_or.accumulate(grid0, axis=0) & 
                 np.logical_or.accumulate(grid0[::-1], axis=0)[::-1]
        candidates |= np.rollaxis(inside, 0, dim+1)
    holes = candidates & ~grid

    hole_labels = np.where(holes)

    return np.column_stack([u[h] for u, h in zip(uniq, hole_labels)])

Finally, there's one issue remaining, shown by this toy example:

o x o o
x   x o
o o o o

Here a hole still remains "undetected". This is trivially solved, by adding the coordinates of the found holes (the x's) to the original coordinate and running a second iteration.

Community
  • 1
  • 1
1

Here's a solution for your example. However, I don't think this can be easily generalized to n-dimensions.

How it works:

Start with the holes in rows. Convert vertex list to array and use lexicographic ordering to sort the rows.

import numpy as np
import matplotlib.pyplot as plt

coords = np.asarray(
    [(1000,3.5), (1000,4.0), (1000,4.5), (1000,5.0), (1000,5.5),
     (1100,4.5), (1100, 6.5), (1200,4.0), (1200,5.5), (1200,7.0), (1200,5.5),
     (1300,3.5), (1300,4.0), (1300,4.5), (1300, 5.5), (1700,5.0) ])

coords = coords[ np.lexsort(( coords[:,1], coords[:,0] )),:]

Determine grid size as the minimum difference between to vertices that is not zero.

diffs = np.diff(coords, axis = 0)
dx = np.min(diffs[diffs[:,0] > 0.0, 0])
dy = np.min(diffs[diffs[:,1] > 0.0, 1])

The grid contains holes where there is no change in the x-coordinate and the change in the y-coordinate is larger than dy.

indices = (diffs[:,0] == 0.0) * (diffs[:,1] > dy)

Expand the holes to list of missing grid points using their indices to extract the starting point and the length of the hole. Finally, concatenate into numpy.array or return empty array if there is no hole.

hole_list = [ np.asarray( [ [x, y] for y in np.arange( y + dy, y + Dy, dy )] )
                            for ((x, y), Dy) in zip ( coords[indices,:],
                                                      diffs[indices,1] ) ]

if len( hole_list ) > 0:
    holes_x = np.concatenate( hole_list )
else:
    holes_x = np.asarray( [] )

Now add the found holes to the grid and look for holes in columns. Only have to switch order of lexicographic ordering and add the holes in the rows to avoid finding them twice.

# Holes in columns.
coords_x = np.append( coords, holes_x, axis = 0 )
coords_x = coords[ np.lexsort( ( coords[:,0], coords[:,1] ) ), : ]
diffs = np.diff( coords_x, axis = 0 )

indices = ( diffs[:,1] == 0.0 ) * ( diffs[:,0] > dx )
hole_list = [ np.asarray( [ [x, y] for x in np.arange( x + dx, x + Dx, dx )] )
                            for ((x, y), Dx) in zip ( coords_x[indices,:],
                                                      diffs[indices,0] ) ]
if len( hole_list ) > 0:
    holes_y = np.concatenate( hole_list )
else:
    holes_y = np.asarray( [] )

Example:

import numpy as np
import matplotlib.pyplot as plt

coords = np.asarray(
    [(1000,3.5), (1000,4.0), (1000,4.5), (1000,5.0), (1000,5.5),
     (1100,4.5), (1100, 6.5), (1200,4.0), (1200,5.5), (1200,7.0), (1200,5.5),
     (1300,3.5), (1300,4.0), (1300,4.5), (1300, 5.5), (1700,5.0) ])

coords = coords[ np.lexsort(( coords[:,1], coords[:,0] )),:]

# Find x and y grid sizes.
diffs = np.diff(coords, axis = 0)
dx = np.min(diffs[diffs[:,0] > 0.0, 0])
dy = np.min(diffs[diffs[:,1] > 0.0, 1])

# Holes in rows.
indices = (diffs[:,0] == 0.0) * (diffs[:,1] > dy)
hole_list = [ np.asarray( [ [x, y] for y in np.arange( y + dy, y + Dy, dy )] )
                            for ((x, y), Dy) in zip ( coords[indices,:],
                                                      diffs[indices,1] ) ]

if len( hole_list ) > 0:
    holes_x = np.concatenate( hole_list )
else:
    holes_x = np.asarray( [] )


# Holes in columns.
coords_x = np.append( coords, holes_x, axis = 0 )
coords_x = coords[ np.lexsort( ( coords[:,0], coords[:,1] ) ), : ]
diffs = np.diff( coords_x, axis = 0 )

indices = ( diffs[:,1] == 0.0 ) * ( diffs[:,0] > dx )
hole_list = [ np.asarray( [ [x, y] for x in np.arange( x + dx, x + Dx, dx )] )
                            for ((x, y), Dx) in zip ( coords_x[indices,:],
                                                      diffs[indices,0] ) ]
if len( hole_list ) > 0:
    holes_y = np.concatenate( hole_list )
else:
    holes_y = np.asarray( [] )

# Plot holes.
f = plt.figure()
ax = f.add_subplot(111)
ax.scatter( coords[:,0], coords[:,1], c = 'g', s=200 )
ax.scatter( holes_x[:,0], holes_x[:,1], c = 'r', s=50 )
ax.scatter( holes_y[:,0], holes_y[:,1], c = 'b', s=50 )
Community
  • 1
  • 1
spfrnd
  • 886
  • 8
  • 15