Reduce GPS data set by distance

Question

I have a set of GPS coordinates, created by a GPS sensor and a Raspberry Pi. I am poling the sensor at 10hz and recording the data into an SQL DB on the Pi. The system is on top of my car (and part of a house scanning tool for the building industry). The issue is that I am driving at different speeds. In some instances I have to stop to allow other cars to pass, all the while record GPS location at 10hz.

Once the data is recorded I want to post-process the GPS data and output a reduced list of coordinates so that I have locations approximately 1 metre apart.

I know I can maybe use Pandas for this, but have no idea where to start.

This is an example data set:

51.80359349246259,-4.741180850463812
51.80361005410784,-4.740873766196046
51.80351890237921,-4.7415190658979895
51.803152371942325,-4.74057836870229
51.80352232936482,-4.740392650792621
51.80361261925252,-4.740896906964529
51.803487420307796,-4.7402764541541265
51.80353017387817,-4.74136689657748
51.80287372471039,-4.741218904144232
51.80326530703784,-4.740193742088211

Any help will be very much appreciated.

But, you know that as soon as you drive over 22mph, you cover more than 1 metre every 1/10th second. — quamrana, Jun 28 '21 at 09:33
Hi yes. In that instance I'd need to use the closest location. This is part of a prototype development that will help us determine any required hardware upgrades (high Hz GPS, higher FPS camera etc). Thanks. — plumby101, Jun 28 '21 at 09:48

Ferris · Answer 1 · 2021-06-29T01:55:57.160

How about using geohash to reduce the same location.

http://en.wikipedia.org/wiki/Geohash

about the precision: https://gis.stackexchange.com/questions/115280/what-is-the-precision-of-geohash

#   (maximum X axis error, in km)     
1   ± 2500
2   ± 630
3   ± 78
4   ± 20
5   ± 2.4
6   ± 0.61
7   ± 0.076
8   ± 0.019
9   ± 0.0024
10  ± 0.00060
11  ± 0.000074

# !pip install pygeodesy
from pygeodesy import geohash
def df_add_geohash(df, precision=7, col_lat='lat', col_lng='lon', geo_col='geo'):
    df_to_convert = df.copy()
    cond = df_to_convert[col_lat].notnull()
    df_to_convert.loc[cond, geo_col] = (df_to_convert[cond].apply(lambda x: geohash.encode(
                        x[col_lat], x[col_lng], precision=precision) 
                       ,axis=1))
    return df_to_convert


# apply the function
dfn = df_add_geohash(df, 7, 'lat', 'lon')
# filter the continuous same geo
cond = dfn['geo'] == dfn['geo'].shift(1)
print(dfn[~cond])

#          lat       lon      geo
# 0  51.803593 -4.741181  gchwsne
# 3  51.803152 -4.740578  gchwsnk
# 4  51.803522 -4.740393  gchwsns
# 5  51.803613 -4.740897  gchwsne
# 6  51.803487 -4.740276  gchwsns
# 7  51.803530 -4.741367  gchwsne
# 8  51.802874 -4.741219  gchwsn7
# 9  51.803265 -4.740194  gchwsnk

This is quite an elegant solution, except the difference between a precision of 7 and 8 is huge. At 8 it shows all of my GPS data (167 results) and at 7 it shows 42, with a point location difference of around 40 metres. — plumby101, Jun 28 '21 at 13:34

score 1 · Answer 2 · answered Jun 29 '21 at 02:11

If you want to get a more precise result, you could calculate the distance between the nearby record point, and filter the distance small than 1m.

df = pd.DataFrame(
    [{'lat': 51.803593492462596, 'lon': -4.741180850463811},
     {'lat': 51.80361005410785, 'lon': -4.740873766196046},
     {'lat': 51.80351890237921, 'lon': -4.7415190658979895},
     {'lat': 51.80315237194233, 'lon': -4.74057836870229},
     {'lat': 51.803522329364824, 'lon': -4.7403926507926215},
     {'lat': 51.80361261925252, 'lon': -4.740896906964529},
     {'lat': 51.803487420307796, 'lon': -4.740276454154127},
     {'lat': 51.80353017387817, 'lon': -4.74136689657748},
     {'lat': 51.80287372471039, 'lon': -4.741218904144231},
     {'lat': 51.80326530703784, 'lon': -4.740193742088211}]
)

df['lat_pre'] =  df['lat'].shift(1)
df['lon_pre'] =  df['lon'].shift(1)

# !pip install geopy
# https://geopy.readthedocs.io/en/stable/#installation
from geopy.distance import geodesic
cond = df['lat_pre'].notnull()
df.loc[cond, 'distance'] = df[cond].apply(lambda row: geodesic((row.lat, row.lon),
                                                               (row.lat_pre, row.lon_pre)).m
                                             , axis=1)

cond = df['distance'] < 1
print(df[~cond])

    #          lat       lon    lat_pre   lon_pre   distance
    # 0  51.803593 -4.741181        NaN       NaN        NaN
    # 1  51.803610 -4.740874  51.803593 -4.741181  21.262108
    # 2  51.803519 -4.741519  51.803610 -4.740874  45.652403
    # 3  51.803152 -4.740578  51.803519 -4.741519  76.639257
    # 4  51.803522 -4.740393  51.803152 -4.740578  43.110166
    # 5  51.803613 -4.740897  51.803522 -4.740393  36.204379
    # 6  51.803487 -4.740276  51.803613 -4.740897  45.007709
    # 7  51.803530 -4.741367  51.803487 -4.740276  75.367133
    # 8  51.802874 -4.741219  51.803530 -4.741367  73.748842
    # 9  51.803265 -4.740194  51.802874 -4.741219  83.059036

score 0 · Answer 3 · answered Jun 28 '21 at 09:31

library(data.table)
library(hutils)
setDT(gpsdata)
setDT(busdata.data)

gps_orig <- copy(gpsdata)
busdata.orig <- copy(busdata.data)

setkey(gpsdata, lat)

# Just to take note of the originals
gpsdata[, gps_lat := lat + 0]
gpsdata[, gps_lon := lon + 0]

busdata.data[, lat := latitude_bustops + 0]
busdata.data[, lon := longitude_bustops + 0]


setkey(busdata.data, lat)

gpsID_by_lat <- 
  gpsdata[, .(id), keyby = "lat"]


By_latitude <- 
  busdata.data[gpsdata, 
               on = "lat",

               # within 0.5 degrees of latitude
               roll = 0.5, 
               # +/-
               rollends = c(TRUE, TRUE),

               # and remove those beyond 0.5 degrees
               nomatch=0L] %>%
  .[, .(id_lat = id,
        name_lat = name,
        bus_lat = latitude_bustops,
        bus_lon = longitude_bustops,
        gps_lat,
        gps_lon),
    keyby = .(lon = gps_lon)]

setkey(busdata.data, lon)

By_latlon <-
  busdata.data[By_latitude,
               on = c("name==name_lat", "lon"),

               # within 0.5 degrees of latitude
               roll = 0.5, 
               # +/-
               rollends = c(TRUE, TRUE),
               # and remove those beyond 0.5 degrees
               nomatch=0L]

By_latlon[, distance := haversine_distance(lat1 = gps_lat, 
                                           lon1 = gps_lon,
                                           lat2 = bus_lat,
                                           lon2 = bus_lon)]

By_latlon[distance < 0.2]

I found this answer on https://stackoverflow.com/questions/53212103/. I'm not sure how to implement it (or what language it is!). — plumby101, Jun 28 '21 at 13:37

score 0 · Accepted Answer · answered Jun 30 '21 at 10:10

I worked a solution based on finding the distance suggested by @Ferris. The 'mpu.haversine_distance' function returns a distance in KM'd between two lat/lng pairs. I multiply by 1000 to display as metres. I then add these distances up and if it gets over 1 metre I report back that lat/lng. This can be adjusted to 3 metres etc.

import mpu

def processTheSet(batch):
    mycursorll = mydb.cursor()
    sqlll = "SELECT latt, longg FROM interPol WHERE batchID = %s ORDER BY `fileTime`"
    batchI = (batch,)
    mycursorll.execute(sqlll, batchI)
    firstResult = mycursorll.fetchone()
    firstLat = float(firstResult[0])
    firstLng = float(firstResult[1])
    myresultll = mycursorll.fetchall()
    i = 0
    count = 0
    counter = 0
    dist = 0
    for x in myresultll:
        i = i + 1
        thisLat = float(x[0])
        thisLong = float(x[1])
        dist = mpu.haversine_distance((firstLat, firstLng), (thisLat, thisLong)) * 1000
        firstLat = thisLat
        firstLng = thisLong
        counter = counter + dist
        if counter > 1:
            count = count + 1
            counter = 0
            print(thisLong, ",", thisLat)

Reduce GPS data set by distance

4 Answers4