1

How should I test tuples of floats for equality, when they are returned from encoded polylines?


Context:

I am using Google Directions API to find roads inside a region.

My application is expected to often return repeated road segments, but I would like to "clean" these segments. That is, I want to know when I already have a given segment.

Since each node in the road network is a geographical location represented by a tuple(float,float) (a latitude/longitude pair), this means testing tuples of floats for equality.

On one hand, floats are well-known to be unreliable regarding equality comparison. On the other hand, when Google Directions API returns a given same location twice, it is always (by definition, I suppose) exactely the same, since it returns JSON containing encoded polylines).

UPDATE:

here is my current code for decoding the polyline returned from the Directions service (with comments added by me):

def decodePolyline(point_str):

    coord_chunks = [[]]

    for char in point_str:

        value = ord(char) - 63

        split_after = not (value & 0x20)
        value &= 0x1F

        coord_chunks[-1].append(value)

        if split_after:
                coord_chunks.append([])

    del coord_chunks[-1]

    coords = []

    for coord_chunk in coord_chunks:
        coord = 0

        for i, chunk in enumerate(coord_chunk):
            coord |= chunk << (i * 5)
            print coord

        if coord & 0x1:
            coord = ~coord
        coord >>= 1

        ## "shifting" an integer to float
        coord /= 100000.0   

        coords.append(coord)

    points = []
    prev_x = 0
    prev_y = 0
    for i in xrange(0, len(coords) - 1, 2):
        if coords[i] == 0 and coords[i + 1] == 0:
            continue

        prev_x += coords[i + 1]
        prev_y += coords[i]

        # further restricting decimal precision
        points.append([round(prev_x, 6), round(prev_y, 6)])

    return points


if __name__ == '__main__':
    print decodePolyline('_p~iF~ps|U_ulLnnqC_mqNvxq`@')

UPDATE TWO:

This might be of interest, regarding possible API upgrades or replacements in similar applications:

MapQuest Platform Services use a slight variant of the Google Polyline Encoding Format. The MapQuest Platform Services variant of the algorithm allows for arbitrary precision of encoded data. The default precision for encoded data is 5 digits (as with the Google format), but MapQuest line data is often accurate to 6 digits of precision.

Note that the precision used during encoding must be the same as the precision used during decoding, or your data will decompress to values which are off by orders of magnitude from the input data!

heltonbiker
  • 26,657
  • 28
  • 137
  • 252
  • 2
    Have you tried using a decoder that decodes to `Decimal` instead of `float`? – Ignacio Vazquez-Abrams May 22 '16 at 13:59
  • @IgnacioVazquez-Abrams no, not yet, although the idea seems perfect! I'll study that, and any additional hint is most welcome! Thanks for now! – heltonbiker May 22 '16 at 14:00
  • cast to a string and check equality, or set a threshold for "near enough". – cammil May 22 '16 at 14:00
  • @cammil I thought about that. When I plot a bunch of results, the "grid" fomed by the lowest-precision (about 1e-5) is clearly visible, so I could use a smaller threshold, sure. But I believe there might be more direct and less computationally expensive ways to do the same, in this case, so I'd leave it as plan B. – heltonbiker May 22 '16 at 14:03
  • @heltonbiker Why do you think it would be computationally expensive? Floating point addition and comparison are both single operations on probably all chips ever. Decimal or string representations will be O(n) unless you create a custom type that takes advantage of the strings and decimals being the same number of bits. – cammil May 22 '16 at 14:08
  • Additionally, you will benefit from any changes to the API which may give more or fewer digits at some point in the future. Not sure how likely that is though. Also, note you dont need Euclidean distance for "near enough", either lat or lng far enough will suffice. – cammil May 22 '16 at 14:11
  • Possible duplicate of [What is the best way to compare floats for almost-equality in Python?](http://stackoverflow.com/questions/5595425/what-is-the-best-way-to-compare-floats-for-almost-equality-in-python) – ivan_pozdeev May 22 '16 at 14:15
  • @cammil the API seems quite stable. I updated the question with the exact algorithm I'm using. As you can see, up to a given point the values are integers in nature, and are converted to floats by decimal division (shifting the decimal point, so to speak). So maybe I could even consider them true integers instead of Decimals... – heltonbiker May 22 '16 at 14:15
  • @ivan_pozdeev alghough the title is the same, the context presented here is different. Don't know what better title I could give, though... I think it has gone more in the direction of comparison of decoded polylines than comparison of float tuples, to be honest. – heltonbiker May 22 '16 at 14:18
  • Containers are compared by value in Python so it doesn't matter if floats are standalone or in a container. – ivan_pozdeev May 22 '16 at 14:37
  • Can you not use the encoded polylines directly? That will tell you whether you have the tile already or not. – cammil May 22 '16 at 14:43
  • @ivan_pozdeev I meant that, since the values come from inside a polyline decoding function and are originally integers, the best solution I found so far is to hack the decoding function to return ints. That makes the float comparison moot in this context. – heltonbiker May 22 '16 at 15:15
  • @cammil encoded polylines contain the first position, and then only _deltas_ from the previous position (these deltas are expected to be typically small and thus require less significant digits for a given geographic resolution). Thus, I cannot tell if a given position is inside of an encoded polyline before fully decoding it first. – heltonbiker May 22 '16 at 15:17

0 Answers0