-2

I am trying to modify the numpy implementation of the Ramer–Douglas–Peucker (RDP) algorithm, to return a mask of the filtered values instead of a filtered array with the values.

The problem is how to return the correct mask recursively.

crlb
  • 1,282
  • 12
  • 18
  • 1
    The RDP algorithm requires a lot of computations on individual 2D points. `pldist` operates on three (2,1)-shaped numpy arrays, for example. NumPy is quite slow compared to pure Python code when there is lots of computation on small arrays. [Sebleier's non-NumPy implementation](https://github.com/sebleier/RDP/), for instance, is ~42x faster. – unutbu Jan 09 '15 at 22:36
  • That makes sense. Maybe it is possible to modify the `pldist` function to operate on multiple points, instead of looping through them. – crlb Jan 12 '15 at 09:03

1 Answers1

1

Well it facilitated to understand the algorithm correctly. If all points between the first and last segment in each recursive call are within epsilon, those should be marked as False while the start and end should be True. Here is my solution:

import numpy as np

def pldist(x0, x1, x2):
    return np.divide(np.linalg.norm(np.linalg.det([x2 - x1, x1 - x0])),
                     np.linalg.norm(x2 - x1))

def rdp_index(M, epsilon=0, dist=pldist):

    dmax = 0.0
    index = -1

    for i in xrange(1, M.shape[0]):
        d = dist(M[i], M[0], M[-1])

        if d > dmax:
            index = i
            dmax = d

    if dmax > epsilon:
        r1 = rdp_index(M[:index + 1], epsilon, dist)
        r2 = rdp_index(M[index:], epsilon, dist)

        return np.vstack((r1[:-1], r2))
    else:
        part_mask = np.empty_like(M, dtype = bool)
        part_mask.fill(False)
        part_mask[0] = True
        part_mask[-1] = True
        return part_mask

Note that this implementation is using recursion, and this solution gave problems with large datasets https://stackoverflow.com/a/2401520/2082968. I.e. the maximum amount of recursion calls were exceeded. Below is a solution that is using a stack and a while loop instead of the recursive calls. In addition, the distances are calculated a bit more efficiently.

def dsquared_line_points(P1, P2, points):
    '''
    Calculate only squared distance, only needed for comparison
    http://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line
    '''
    xdiff = P2[0] - P1[0]
    ydiff = P2[1] - P1[1]
    nom  = (
        ydiff*points[:,0] - \
        xdiff*points[:,1] + \
        P2[0]*P1[1] - \
        P2[1]*P1[0]
    )**2
    denom = ydiff**2 + xdiff**2
    return np.divide(nom, denom)

def rdp_numpy(M, epsilon = 0):

    # initiate mask array
    # same amount of points
    mask = np.empty(M.shape[0], dtype = bool)

    # Assume all points are valid and falsify those which are found
    mask.fill(True)

    # The stack to select start and end index
    stack = [(0 , M.shape[0]-1)]

    while (len(stack) > 0):
        # Pop the last item
        (start, end) = stack.pop()

        # nothing to calculate if no points in between
        if end - start <= 1:
            continue

        # Calculate distance to points
        P1 = M[start]
        P2 = M[end]
        points = M[start + 1:end]
        dsq = dsquared_line_points(P1, P2, points)

        mask_eps = dsq > epsilon**2

        if mask_eps.any():
            # max point outside eps
            # Include index that was sliced out
            # Also include the start index to get absolute index
            # And not relative 
            mid = np.argmax(dsq) + 1 + start
            stack.append((start, mid))
            stack.append((mid, end))

        else:
            # Points in between are redundant
            mask[start + 1:end] = False

    return mask
Community
  • 1
  • 1
crlb
  • 1,282
  • 12
  • 18