2

I use Python:

I have 2 arrays of GPS points - lon and lat (more than 500 000 points).

I have 1 array of date-time.

lon = numpy.array(lon)
lat = numpy.array(lat)
dt = numpy.array(dt)

I have a location error (GPS sensor error). For example 15 meters.

GPS_sensor_error = 0.015    

I need to exclude a GPS_sensor_error from coordinates that there were no asterisks on a track.

enter image description here

(I don't draw a point with identical coordinates)

enter image description here

How I am able to do it?

Now:

  1. I calculate distance between points.

  2. I find the minimum distance, if it less GPS_sensor_error, then I average lon, lat.

  3. repeat 1.

  4. repeat 2.

  5. repeat until all distances won't be more GPS_sensor_error

Update:

     lon = numpy.array()
     lat = numpy.array()

     flag = True
     while flag:
        lon1 = lon[:-1]
        lon2 = lon[1:]
        lat1 = lat[:-1]
        lat2 = lat[1:]

        '''distance'''
        x = (lon2 - lon1)
        y = (lat2 - lat1)
        d = numpy.sqrt(x * x + y * y)

        min = numpy.min(d)
        if min < GPS_sensor_error:
            j = numpy.where(d == min)[0][0]

            lon[j] = (lon[j] + lon[j + 1]) / 2
            lat[j] = (lat[j] + lat[j + 1]) / 2

            lon = numpy.delete(lon, j + 1)
            lat = numpy.delete(lat, j + 1)

        else:
            flag = False

Bypass on all points works at a pure python very long... Prompt please, how to implement it using scipy, numpy?


Thanks

P.s. probably already there is a GPS filter in scipy, numpy?

Olga
  • 1,395
  • 2
  • 25
  • 34
  • I think I understand in principle what you want to achieve, however, details are unclear from your question. Could you give a simple example (3 or 4 points), that shows what you mean by distance, what means `average lon,lat` exactly in this context? – tzelleke Jan 10 '13 at 12:33
  • Why don't you include your slow Python code? The way in which you want to conduct the averaging is not very clear from your deescription. – Jaime Jan 10 '13 at 14:51
  • I added the slow Python code to the description. – Olga Jan 11 '13 at 13:46
  • @Olga, in your slow algorithm `lon` and `lat` are shrinking with each iteration: `lon = numpy.delete(lon, j + 1); lat = numpy.delete(lat, j + 1)`. Now in a comment on @spam_eggs answer you explicitely state, that the sizes of `lon` and `lat` should be preserved???? – tzelleke Jan 11 '13 at 14:16
  • As I said in my answer: Dont use pythagoras for the distance. It gets terribly inaccurate at higher and lower latitudes. – RickyA Jan 11 '13 at 15:56
  • @TheodrosZellekeyes, the sizes of lon and lat shall be saved. I didn't include in an example part of a code in which I recover the initial size of lon and lat. – Olga Jan 11 '13 at 17:58

2 Answers2

5

From a data science perspective what you are doing is not correct. You cant just use the average error distance as a cutoff and think your data will be more correct. The two points you are comparing can have an error more or less than 15 m they can shift toward each other or move away of each other. And if you don't have another exact dataset there is no way of telling what would be the correct point. You can't make this dataset more precise.

However I think you goal is to simplify your dataset, not to make it more accurate. For that you can use the Douglas–Peucker algorithm. I would suggest you load your data in an Postgis enabled database (Postgresql + postgis) and then use the simplify function. This will require some db setup time, but then it will speed you up greatly. However if you want it in pure python this SO question has a very nice snippet.

BTW. If your are doing distance calculations with lat,lon do not use Pythagoras. It is not valid since lat,lon is not Euclidean. Use the haversine algorithm.

Community
  • 1
  • 1
RickyA
  • 15,465
  • 5
  • 71
  • 95
  • 2
    If this idea works for you, you may want to look at the [Shapely](http://pypi.python.org/pypi/Shapely) package, which also has a `simplify` method. From its docstring: 'Returns a simplified geometry produced by the Douglas-Puecker algorithm'. – subnivean Jan 10 '13 at 20:14
  • Second that. Shapely is also a very usable package for these kind of problems. – RickyA Jan 11 '13 at 15:53
  • @RickyA, thanks, for calculation of distance I will use haversine algorithm. – Olga Jan 11 '13 at 18:02
2

You can easily do all your calculations using numpy primitives only and no python looping.

First define your distance function as a function that operates on numpy arrays (I assume you did that already ..):

def dist(lon1, lat1, lon2, lat2):
    """Compute the distance between (lon1, lat1) and (lon2, lat2). 
       Both may be numpy arrays."""
    ...

Then apply it to your data as so:

d = dist(lon[:-1], lat[:-1], lon[1:], lat[1:])

This notation means you will compare the ith point to the i+1th.

Next find the indices where d is greater than your threshold:

I = d > GPS_sensor_error

Now keep only those and the first point!

lon_out = numpy.hstack([[lon[0]], lon[1:][I]]) # could also use numpy.where
lat_out = numpy.hstack([[lat[0]], lat[1:][I]])

Update:

If you want to keep the same number of points, ie set lon[i] to the last good value, use the following trick instead of the previous two lines:

idx, = numpy.where(I)
idx = numpy.hstack([[0], idx])
J = numpy.cumsum(I) # the trick
lon_out = lon[idx[J]]
lat_out = lat[idx[J]]
spam_eggs
  • 1,058
  • 6
  • 11
  • thanks, but this decision reduces quantity of originating points. As a result: len(lot_out) <<< len(lot). It is necessary: len(lot_out) = len(lot). I probably incorrectly asked a question... – Olga Jan 10 '13 at 10:41
  • 1
    No problem, I updated my answer to do that. Indeed to do it in pure numpy requires some craftiness... – spam_eggs Jan 10 '13 at 14:13
  • If you compare point i to point i+1 and remove all for which the distance is less than your threshold, then you will miss the cumulative effect of many small steps in the same direction. In other words: the distance covered in 1 step may be below the threshold, but the distance covered in 10 small steps may be above the threshold! – Daan Jan 11 '13 at 07:24
  • @Daan I agree! I only chose to address the programming aspect of the algo Olga outlined, not the correctness ... but you're right that for starters the right algo should be chosen – spam_eggs Jan 11 '13 at 10:36
  • thanks spam_eggs, your code is useful to me in the future:) (I added the slow Python code to the description). – Olga Jan 11 '13 at 13:55
  • I didn't include in my example part of a code in which I recover the initial size of lon and lat. – Olga Jan 11 '13 at 18:06