0

The script below takes one string input as a polyline and returns a data frame with the corresponding latitude/longitude pairs.

I would like to input to be a data set as follows:

ActivityID Polyline
1 PolyLineValue1
2 PolyLineValue2
3 PolyLineValue2

and the output to be (keeping the ActivityID)

ActivityID latitude longitude
1 123 123
1 123 123
1 123 123
2 123 123
2 123 123
2 123 123
3 123 123
3 123 123
3 123 123

I was thinking along the lines of iterating over the input dataset to do this but I've read here that's not a great idea performance wise.

Please can someone advice how to do this in Python?

import pandas as pd

polyline_str = 'eldyHyOOCKBuA~@_Ar@[LgC`CaAdAaBtA[h@Sv@a@|@St@M\\g@l@TIP?FMDi@Pw@L_@Ra@XH^jAfAtFF|@@~@AhB@z@Aj@]g@Uq@k@oAUu@Ow@UmBWgAUVs@zAc@pBu@xAg@vAm@lBaAtCYn@Y~@Qz@gArCc@\\]`@y@j@e@M}AyCM]Ou@_@kAe@mBqAeGaAcFWo@e@eCWwBY_BWwB[kBAu@LY^a@^a@f@w@d@]n@o@\\q@r@_AVa@Vm@TSv@yAhBoBv@cA`B_BnB_C~@cA\\c@^]pAyAVUJn@Az@_@pCCz@YxBUrBEv@Vt@b@Rf@JpBTd@LnBThAPpB`@b@@jAPd@JJL@VK`@eAbAaAlA]T_@b@_@\\w@xA_AjAa@b@OJ[h@[l@Sr@a@bAg@jBLW`@yA`@e@NKNPXdA|@xEL|@@zCFnBGt@a@To@RkARuA@gAKa@@s@vA_@d@URm@z@aAnB[b@Up@q@tA_@d@_AbAc@Vm@RYNe@`@cAt@a@`@_@PYLg@SA?ORCf@EJa@TeCtBKJmC~Au@La@hAJnA?XcUaD@^Cg@BGBBMJOPIVC\\?f@Hx@ZlBH|@F`AXtDD~@Hr@B~@BJf@Pd@FBTC`BHrCAx@WjCKz@E`A]lBOvBSpBc@|Aa@`Cq@zC_@rBmArE[j@o@~AkAdC}@hAy@jA}@`A[h@_@f@]T}@_C]k@G@c@Xs@sAUOcApAeA~@wA`AQBa@Tc@LiAd@i@Hc@@e@AmAFgAAe@Hg@Pc@`@{@bAe@HkAJKHExBWv@F|@\\x@BrAQPe@@e@My@AmAIkAUaAYc@Dg@Kg@E{CFkAMkAHoGAoBCe@Kg@EoASkAc@kAK_@o@g@FMDYj@GV]b@aA\\{CbAcAfAc@Xc@N", "_ujyHpoDb@Ub@]^c@`@[pBu@hAYb@Q`@Wp@qA\\FBBVp@hABf@Ld@Zb@LPCnBZd@?t@FlAARBjA@l@Kd@[hAEz@Nd@Np@?r@I`AUdBp@fBTlCJx@Hd@Ab@WNu@SgACw@GS]u@Yu@BGnAg@hAWd@Yl@k@^g@d@S\\GfAEf@?b@Hj@?f@FTAhASd@Yb@U`@[`@_@~@kAb@S^]~AkBZk@F?Zf@\\t@Vn@Nz@nAxBHCz@_Az@gAt@s@RY^a@t@wAZc@h@uAT_@~@oDv@qBH{@Nu@Xk@Nq@Bs@Lu@PuAPy@F{@Ps@Lw@Ru@Fk@Rw@@}@Kw@NaAP_DGy@?_AF}@Gw@M{@IQa@Du@iGMu@KeBQ{AQw@Iy@Ss@H}@\\g@Z]?CCChTpB@HHJBERGDOXMt@Od@@RBvAZFCTAPFv@AFGH?x@u@\\c@Vi@EMeAqCA_@Xm@RW~@cA~@w@|@cAt@c@Pe@Kw@_BeGcAcF}@uDmBoHKu@G{@YuBI}@i@yCK{@WkAJw@^c@`@y@lAc@^YrA}B|@iAPe@t@uAd@s@z@cArAsAnAmAn@}@~@aA^e@t@iA`AaAtA{A\\HEz@UjBQvBSlAG~@YnB?|@Vl@`@NfARf@B\\FhBNdBRbF`AjAP\\d@`@lA'

index, lat, lng = 0, 0, 0

# list
coordinates = []

# Set
changes = {'latitude': 0, 'longitude': 0}

# Coordinates have variable length when encoded, so just keep
# track of whether we've hit the end of the string. In each
# while loop iteration, a single coordinate is decoded.
while index < len(polyline_str):
    # Gather lat/lon changes, store them in a dictionary to apply them later
    for unit in ['latitude', 'longitude']:
        shift, result = 0, 0

        while True:
            byte = ord(polyline_str[index]) - 63
            index += 1
            result |= (byte & 0x1f) << shift
            shift += 5
            if not byte >= 0x20:
                break

        if (result & 1):
            changes[unit] = ~(result >> 1)
        else:
            changes[unit] = (result >> 1)

    lat += changes['latitude']
    lng += changes['longitude']

    coordinates.append((lat / 100000.0, lng / 100000.0))

df = pd.DataFrame(coordinates, columns = ['lat', 'lng'])
print(df)
SuperStormer
  • 4,997
  • 5
  • 25
  • 35
Mark
  • 75
  • 1
  • 7

1 Answers1

1

First, we wrap your code into a function:

def decode_polyline(polyline_str):
    index, lat, lng = 0, 0, 0

    # list
    coordinates = []

    # Set
    changes = {'latitude': 0, 'longitude': 0}

    while index < len(polyline_str):
        # Gather lat/lon changes, store them in a dictionary to apply them later
        for unit in ['latitude', 'longitude']:
            shift, result = 0, 0

            while True:
                byte = ord(polyline_str[index]) - 63
                index += 1
                result |= (byte & 0x1f) << shift
                shift += 5
                if not byte >= 0x20:
                    break

            if (result & 1):
                changes[unit] = ~(result >> 1)
            else:
                changes[unit] = (result >> 1)

        lat += changes['latitude']
        lng += changes['longitude']

        coordinates.append((lat / 100000.0, lng / 100000.0))
    return coordinates

This function accepts a single polyline string and returns a list of corresponding latitudes and longitudes.

We use the following input as an example.

df = pd.DataFrame({
    'ActivityID': [1,2],
    'Polyline': ['ivq~FvoyuOi{l~FjzdvO', 'ecp~FffkvO']
})

Then, we create a new column called latlong, which is a list of latitudes and longitudes of each polyline.

As you've already pointed out, iterating over every row in a dataframe is not a good idea performance-wise. Therefore, we use apply function in this case.

df['latlong'] = df['Polyline'].apply(decode_polyline)

The dataframe now looks like this:

ActivityID Polyline latlong
1 ivq~FvoyuOi{l~FjzdvO [(41.87509, -87.62636), (83.72538, -175.31074)]
2 ecp~FffkvO [(41.86691, -87.717)]

To unpack the list of coordinates, we can use the explode function from pandas.

df = df.set_index(['ActivityID', 'Polyline']).explode('latlong').reset_index()

After this command, the dataframe looks like this:

ActivityID Polyline latlong
1 ivq~FvoyuOi{l~FjzdvO (41.87509, -87.62636)
1 ivq~FvoyuOi{l~FjzdvO (83.72538, -175.31074)
2 ecp~FffkvO (41.86691, -87.717)

Finally, we upack the latlong column into latitude and longitude column to get the desired result.

df['latitude'], df['longitude'] = zip(*df['latlong'])

If you wonder about the zip(*arg), please check this post.

Triet Doan
  • 11,455
  • 8
  • 36
  • 69