Fitting an Orthogonal Grid to Noisy Coordinates

Question

Problem

I have a list of coordinates that are meant to form a grid. Each coordinate has a random error component and some of the coordinates are missing. Grid could be rotated (update). I want to fit a orthogonal grid to the data points and return a list of the grid's vertices. For example:

Grid Problem

Application

The purpose is to find a grid in a scanned image. The data points come from the results of contour or edge detection in OpenCV. An example is image with a grid of photos.

Goal

I wrote some Python code that works, but would like to find a linear algebra algorithm using SciPy, statsmodels or other modules that would be more robust and handle a small rotation of the grid (less than 10°).

Python Code Using Lists Only

# Noisy [x, y] coordinates (origin is upper-left corner)
pts = [[103,101],
       [198,103],
       [300, 99],
       [ 97,205],
       [304,202],
       [102,295],
       [200,303],
       [104,405],
       [205,394],
       [298,401]]

def row_col_avgs(num_list, ratio):
    # Finds the average of each row and column. Coordinates are
    # assigned to a row and column by specifying an error ratio.
    last_num, sum_nums, count_nums, avgs = 0, 0, 0, []
    num_list.sort()
    for num in num_list:
        # Calculate average for last row or column and begin new row or column
        if num > (1+ratio)*last_num and count_nums != 0:
            avgs.append(int(round(sum_nums/count_nums,0)))
            sum_nums = num
            count_nums = 1
        # Or continue with current row or column
        else:
            sum_nums += num
            count_nums += 1
        last_num = num
    avgs.append(int(round(sum_nums/count_nums,0)))
    return avgs

# Split coordinates into two lists of x's and y's
xs, ys = map(list, zip(*pts))

# Find averages of each row and column of the grid
x_avgs = row_col_avgs(xs, 0.1)
y_avgs = row_col_avgs(ys, 0.1)

# Return vertices of completed averaged grid
avg_grid = []
for y_avg in y_avgs:
    avg_row = []
    for x_avg in x_avgs:
        avg_row.append([int(x_avg), int(y_avg)])
    avg_grid.append(avg_row)

print(avg_grid)

Output

[[[102, 101], [201, 101], [301, 101]], 
 [[102, 204], [201, 204], [301, 204]], 
 [[102, 299], [201, 299], [301, 299]], 
 [[102, 400], [201, 400], [301, 400]]]

Can you elaborate a little on how your clustering average algorithm works? Also, this code can definitely be made more efficient by using numpy. For example everything you're doing now is using lists, you can easily use numpy. For example `xs, ys = map(list, zip(*xys))` can easily be `xs, ys = xys[:,0], xys[:, 1]` in numpy. Maybe give it a try on your own? — r4bb1t, Jul 17 '20 at 04:39
PyWalker, my cluster_avgs function loops through a sorted list of numbers. When the ratio of a number to its preceding number is greater 1.1, that's the end of the current cluster and the average of that cluster is then calculated. I'll use your suggestion when using Numpy. Thanks. — Jakub, Jul 17 '20 at 06:23

Jakub · Answer 1 · 2022-07-09T21:38:07.080

Parallel Slopes Ordinary Least Squares (OLS) Model:
y = mx + grp + b where m=slope, b=y-intercept, & grp=categorical variable.

This is an alternative algorithm that can handle a rotated grid.

The OLS model includes both the data points in the original orientation and a 90° rotation of the same data points. This is necessary so all gridlines are parallel and have the same slope.

Algorithm:

Find a reference gridline to compare with remaining points by choosing two neighboring points in the first or last row with a slope closest to zero.
Calculate the distances between this reference line and the remaining points.
Segment points into groups w.r.t. the calculated distances (one group per gridline).
Repeat steps 1 to 3 for the 90 degree rotated grid and combine results.
Create a parallel slopes OLS model to determine linear equations for the gridlines.
Rotate the rotated gridlines back to their original orientation.
Calculate the intersection points.

Note: Fails if noise, angle and/or missing data are too much.

Example:

Python Code to Create Example

def create_random_example():
    # Requires import of numpy and random packages
    # Creates grid with random noise and missing points
    # Example will fail if std_dev, rotation, pct_removed too large
    
    # Parameters
    first_row, last_row = 100, 900
    first_col, last_col = 100, 600
    num_rows = 6
    num_cols = 4
    rotation = 3 # degrees that grid is rotated
    sd = 3 # percent std dev of avg x and avg y coordinates
    pct_remove = 30 # percent of points to randomly remove from data
    
    # Create grid
    x = np.linspace(first_col, last_col, num_cols)
    y = np.linspace(first_row, last_row, num_rows)
    xx, yy = np.meshgrid(x, y)
    
    # Add noise
    x = xx.flatten() + sd * np.mean(xx) * np.random.randn(xx.size) / 100
    y = yy.flatten() + sd * np.mean(yy) * np.random.randn(yy.size) / 100
    
    # Randomly remove points
    random_list = random.sample(range(0, num_cols*num_rows), 
                          int(pct_remove*num_cols*num_rows/100))
    x, y = np.delete(x, random_list), np.delete(y, random_list)
    
    pts = np.column_stack((x, y))
    
    # Rotate points
    radians = np.radians(rotation)
    rot_mat = np.array([[np.cos(radians),-np.sin(radians)],
                        [np.sin(radians), np.cos(radians)]])
    einsum = np.einsum('ji, mni -> jmn', rot_mat, [pts])
    pts = np.squeeze(einsum).T
    
    return np.rint(pts)

Python Code to Fit Gridlines

import numpy as np
import pandas as pd
import itertools
import math
import random
from statsmodels.formula.api import ols
from scipy.spatial import KDTree
import matplotlib.pyplot as plt

def pt_line_dist(pt, ref_line):
    pt1, pt2 = [ref_line[:2], ref_line[2:]]
    # Distance from point to line defined by two other points
    return np.linalg.norm(np.cross(pt1 - pt2, [pt[0],pt[1]])) \
         / np.linalg.norm(pt1 - pt2)

def segment_pts(amts, grp_var, grp_label):
    # Segment on amounts (distances here) in last column of array
    # Note: need to label groups with string for OLS model
    amts = amts[amts[:, -1].argsort()]
    first_amt_in_grp = amts[0][-1]
    group, groups, grp = [], [], 0
    for amt in amts:
        if amt[-1] - first_amt_in_grp > grp_var:
            groups.append(group)
            first_amt_in_grp = amt[-1]
            group = []; grp += 1
        group.append(np.append(amt[:-1],[[grp_label + str(grp)]]))
    groups.append(group)
    return groups

def find_reference_line(pts):
    # Find point with minimum absolute slope relative both min y and max y
    y = np.hsplit(pts, 2)[1] # y column of array
    m = []
    for i, y_pt in enumerate([ pts[np.argmin(y)], pts[np.argmax(y)] ]):
        m.append(np.zeros((pts.shape[0]-1, 5))) # dtype default is float64
        m[i][:,2:4] = np.delete(pts, np.where((pts==y_pt).all(axis=1))[0], axis=0)
        m[i][:,4] = abs( (m[i][:,3]-y_pt[1]) / (m[i][:,2]-y_pt[0]) )
        m[i][:,:2] = y_pt
    m = np.vstack((m[0], m[1]))
    return m[np.argmin(m[:,4]), :4]

# Ignore division by zero (slopes of vertical lines)
np.seterr(divide='ignore')

# Create dataset and plot
pts = create_random_example()
plt.scatter(pts[:,0], pts[:,1], c='r') # plot now because pts array changes

# Average distance to the nearest neighbor of each point
tree = KDTree(pts)
nn_avg_dist = np.mean(tree.query(pts, 2)[0][:, 1])

# Find groups of points representing each gridline
groups = []
for orientation in ['o', 'r']: #  original and rotated orientations
    
    # Rotate points 90 degrees (note: this moves pts to 2nd quadrant)
    if orientation == 'r':
        pts[:,1] = -1 * pts[:,1]
        pts[:, [1, 0]] = pts[:, [0, 1]]
    
    # Find reference line to compare remaining points for grouping
    ref_line = find_reference_line(pts) # line is defined by two points
    
    # Distances between points and reference line
    pt_dists = np.zeros((pts.shape[0], 3))
    pt_dists[:,:2] = pts
    pt_dists[:,2] = np.apply_along_axis(pt_line_dist, 1, pts, ref_line).T
    
    # Segment pts into groups w.r.t. distances (one group per gridline)
    # Groups have range less than nn_avg_dist.
    groups += segment_pts(pt_dists, 0.7*nn_avg_dist, orientation)

# Create dataframe of groups (OLS model requires a dataframe)
df = pd.DataFrame(np.row_stack(groups), columns=['x', 'y', 'grp'])
df['x'] = pd.to_numeric(df['x'])
df['y'] = pd.to_numeric(df['y'])

# Parallel slopes OLS model
ols_model = ols("y ~ x + grp + 0", data=df).fit()

# OLS parameters
grid_lines = ols_model.params[:-1].to_frame() # panda series to dataframe
grid_lines = grid_lines.rename(columns = {0:'b'})
grid_lines['grp'] = grid_lines.index.str[4:6]
grid_lines['m'] = ols_model.params[-1] # slope

# Rotate the rotated lines back to their original orientation
grid_lines.loc[grid_lines['grp'].str[0] == 'r', 'b'] = grid_lines['b'] / grid_lines['m']
grid_lines.loc[grid_lines['grp'].str[0] == 'r', 'm'] = -1 / grid_lines['m']

# Find grid intersection points by combinations of gridlines
comb = list(itertools.combinations(grid_lines['grp'], 2))
comb = [i for i in comb if i[0][0] != 'r']
comb = [i for i in comb if i[1][0] != 'o']
df_comb = pd.DataFrame(comb, columns=['grp', 'r_grp'])

# Merge gridline parameters with grid points
grid_pts = df_comb.merge(grid_lines.drop_duplicates('grp'),how='left',on='grp')
grid_lines.rename(columns={'grp': 'r_grp'}, inplace=True)
grid_pts.rename(columns={'b':'o_b', 'm': 'o_m', 'grp':'o_grp'}, inplace=True)
grid_pts = grid_pts.merge(grid_lines.drop_duplicates('r_grp'),how='left',on='r_grp')
grid_pts.rename(columns={'b':'r_b', 'm': 'r_m'}, inplace=True)

# Calculate x, y coordinates of gridline interception points
grid_pts['x'] = (grid_pts['r_b']-grid_pts['o_b']) \
              / (grid_pts['o_m']-grid_pts['r_m'])
grid_pts['y'] = grid_pts['o_m'] * grid_pts['x'] + grid_pts['o_b']

# Results output
print(grid_lines)
print(grid_pts)

plt.scatter(grid_pts['x'], grid_pts['y'], s=8, c='b') # for setting axes

axes = plt.gca()
axes.invert_yaxis()
axes.xaxis.tick_top()
axes.set_aspect('equal')
axes.set_xlim(axes.get_xlim())
axes.set_ylim(axes.get_ylim())

x_vals = np.array(axes.get_xlim())
for idx in grid_lines.index:
    y_vals = grid_lines['b'][idx] + grid_lines['m'][idx] * x_vals
    plt.plot(x_vals, y_vals, c='gray')

plt.show()

"Fails when grid is rotated around 45 degrees": a straight square grid can always be seen as a lacunary grid rotated by 45°. — , Jul 05 '22 at 07:51
@YvesDaoust Yes, and gridlines resulting from lacunary grid would be the exact failure that I'm referring to. — Jakub, Jul 08 '22 at 03:12

score 1 · Answer 2 · answered Jul 04 '22 at 09:16

1

If you project all points on a vertical or horizontal axis, the problem turns to one of clustering with equally spaced clusters.

To perform these clusterings, you can consider the distances between the successive (sorted) points. They will form two clusters: short distances corresponding to noise, and longer ones for the grid size. You can solve the two-way clustering using the Otsu method.

answered Jul 04 '22 at 09:16

I am not sure how to implement two-way clustering to find the grid. I need more info. Sounds more expensive than segmenting the points w.r.t. distances. I'm using the SciPy cKDTree to find distances between rows and columns. I'm using a parallel slopes OLS model for the noise. I have posted what I'm using as answer to my OP. I have used Otsu's thresholding in OpenCV, but not directly for data clustering. I'll have to look into that. Thanks. Also, the rows and columns are not necessarily equally spaced, if that matters. – Jakub Jul 04 '22 at 09:48
I forgot to add that the grid points could be rotated relative to the axes in my application, so that would be a problem for projecting the points. – Jakub Jul 04 '22 at 10:05
I actually wasted my time describing this method, which does not apply to the rotated scenario. – Jul 04 '22 at 19:57
@Ives Daoust: My bad, because I did not mention rotation in my OP. If the angle of rotation is known, do you believe your method would be better than segmentation on distance followed by OLS for noise? I have a simple method for a first estimate of rotation by finding the smallest slope of k-nearest neighbors, then segmenting slopes for all points and choose average of the smallest slope group. Considering the application is for images, OpenCV could be used to find angle, although I find using the grid points to be better. – Jakub Jul 04 '22 at 21:10
@user1196549 can you describe your method in detail I am trying to implement it. – Coddy Jul 25 '23 at 16:52

score 0 · Answer 3 · answered Jun 20 '22 at 09:51

A numpy implementation of your code can be found below. As the size AvgGrid is known, I pre-allocate the required memory (rather than append). This should have speed advantages, especially if the number of output vertices is large.

import numpy as np

# Input of [x, y] coordinates of a sparse grid with errors
xys = np.array([[103,101],
       [198,103],
       [300, 99],
       [ 97,205],
       [304,202],
       [102,295],
       [200,303],
       [104,405],
       [205,394],
       [298,401]])

# Function to average
def ColAvgs(CoordinateList, CutoffRatio = 1.1):

    # Length of CoordinateList
    L = len(CoordinateList)

    # Sort input
    SortedList = np.sort(CoordinateList)

    # Determine indices to average
    RelativeIncrease = SortedList[-(L-1):]/SortedList[:(L-1)]
    CriticalIndices = np.flatnonzero(RelativeIncrease > CutoffRatio) + 1
    Indices = np.hstack((0,CriticalIndices))
    if (Indices[-1] != L):
        Indices = np.hstack((Indices,L))
    #print(Indices)     # Uncomment to show index construction

    # Compute averages
    Avgs = np.empty((len(Indices)-1)); Avgs[:] = np.NaN
    for iter in range(len(Avgs)):
        Avgs[iter] = int( round(np.mean(SortedList[Indices[iter]:Indices[(iter+1)]]) ) )

    # Return output
    return Avgs

# Compute x- and y-coordinates of vertices
AvgsXcoord = ColAvgs(xys[:,0])
AvgsYcoord = ColAvgs(xys[:,1])

# Return all vertices
AvgGrid = np.empty((len(AvgsXcoord)*len(AvgsYcoord),2)); AvgGrid[:] = np.NaN
iter = 0
for y in AvgsYcoord:
    for x in AvgsXcoord:
        AvgGrid[iter, :] = np.hstack((x,y))
        iter = iter+1
print(AvgGrid)

Thank you for your numpy implementation. I wrote new code to incorporate a parallel slopes OLS model rather than using row and column averages. It handles rotated grids. It does a single least squares for both the horizontal and vertical gridlines at the same time. I posted it as an answer. Any suggestions for speeding it up are welcomed. — Jakub, Jul 04 '22 at 09:14