4

I have around 100 list of gps coordinates and I want to draw the line that each of those lists make.

One of the lists plotted using scatter, look somewhat like this:

gps points

Clearly there's a line there;

I tried several methods to sort the gps positions and plot them:

lats = []
lngs = []
with open(filename) as f:
    for line in f:
        lat, lng = line.split("\t")
        lats.append(float(lat))
        lngs.append(float(lng))

def sort_positions(position):
    return position[0]+position[1]

positions= zip(lngs, lats)
positions = sorted(poss, key=sort_positions)
for i, positionin enumerate(positions):
    lng, lat = position
    #plt.scatter(lng, lat)
    try:
        n_lng, n_lat = positions[i+1]
        plt.plot((lng, n_lng),(lat, n_lat), "b")
    except IndexError:
        pass

plt.show()

Sorting by Longitude

def sort_positions(position):
    return position[0]

Longitude Sort

Sorting by Latitude

def sort_positions(position):
    return position[1]

Latitude Sort

Sorting by Summing Both

def sort_positions(position):
    return position[0]+position[1]

Sommation Sort

If the line are mostly straight in one of the axis(latitude/longitude), it plots fine (some minor bumps are still showing)

Here's one of the lists that do plot fine, sorting by latitude.

Latitude Sort 2

I to only plot if the distance between the two points was lesser than 200~500 meters, but I endup with holes in the lines, since there's some missing data.

Distance

Probably I'm doing this wrong. Does anyone know how to plot this lines properly?

EDIT:

In response to rth answer:

Two Approachs

The blue line is using his approach in this questions, the red one is using the method in his other answer.

Ignoring the fact that the red is closing the loop.

There are some limitations in both, first, the red one can't handle too many points, I had to to use 1/90 of the points for both, the blue one generates some weird sharp turns when the points are too clustered (the dark spots in the image), while the red one some weird curves in those places.

f.rodrigues
  • 3,499
  • 6
  • 26
  • 62

3 Answers3

3

The easiest thing to do here would be to figure out why the GPS coordinates were mixed up in the first place and correct that.

If that's not possible, the only thing I can think of is a iterative algorithm, that takes a xy point and decides based on some criteria (e.g. distance, direction of consecutive points, etc) which point should go next.

Something along these lines,

import numpy as np
from scipy.spatial.distance import pdist, squareform

def find_gps_sorted(xy_coord, k0=0):
    """Find iteratively a continuous path from the given points xy_coord,
      starting by the point indexes by k0 """      
    N = len(xy_coord)
    distance_matrix = squareform(pdist(xy_coord, metric='euclidean'))
    mask = np.ones(N, dtype='bool')
    sorted_order = np.zeros(N, dtype=np.int)
    indices = np.arange(N)

    i = 0
    k = k0
    while True:
        sorted_order[i] = k
        mask[k] = False

        dist_k = distance_matrix[k][mask]
        indices_k = indices[mask]

        if not len(indices_k):
            break

        # find next unused closest point
        k = indices_k[np.argmin(dist_k)]
        # you could also add some criterion here on the direction between consecutive points etc.
        i += 1
    return sorted_order, xy_coord[sorted_order]

that you could use as,

 xy_coord = np.random.randn(10, 2)
 sorted_order, xy_coord_sorted = find_gps_sorted(xy_coord, k0=0)

enter image description here

although this may need to be extended.

For instance you could say that in addition to the nearest distance criteria, you don't want the trajectory to turn more then by 90° (if that's something that makes sense for buses) and ignore those points at every iteration. Essentially, you can add enough constraints in this algorithm to get the result you are looking for.

Edit: Since GPS coordinates has some uncertainty, the final step could be to smooth the resulting trace with B-spline interpolation using scipy.interpolate.splprep and playing with the s argument ( see related questions (1), (2))

rth
  • 10,680
  • 7
  • 53
  • 77
  • 1
    What's squareform, pdist and N? The gps positions are somewhat mixed, they have some extra information: Timestamp and ID. The data comes from the bus service in my City, the reason for the missing data is because the buses turn off the GPS when they complete a route and need to move to a new route, or refill, wait in the depot and so on. I was able to draw some route using a single bus of that route, but the other had too few data points that require using all the buses data points, but when I join them together I get this mixed stated. – f.rodrigues Jul 16 '15 at 18:48
  • 1
    'N' undefined, but I assumed it was len(xy_coord) and it worked fine. It works way better than my attempts. It still has some weird turns when the points are too clustered (for instance, inside the depot or near a stop light), but I'll try to fix that later. – f.rodrigues Jul 17 '15 at 07:44
  • @f.rodrigues Sorry about that. See my edit above for the optional smoothing you could do of the of clustered points in the final trace . – rth Jul 17 '15 at 08:46
  • Cool I check your other answer, I tried your approach, but there are some limitations, first it can't handle many points, I had to to use 1/90 of the points, second it generates some weird curves, I added them to the question. – f.rodrigues Jul 17 '15 at 13:38
  • @f.rodrigues Yes, that's because it's trying to reconstruct a closed curve. Setting \per=0` will fix this issue. It should works with as many points as you want, what is the error it gives you otherwise? Well that's the general approach, it might need some more fine tuning to get exactly what you want. – rth Jul 17 '15 at 14:04
  • Oh, I was wondering what that was all about. When I try using all the points it raises `SystemError: error return without exception set` on line `tck, u = splprep(pts.T, u=None, s=0.0, per=0)`, by changing per to 0 it allowed to use 1/10 tho, so I guess there's a hard limit of around 500 points or so. – f.rodrigues Jul 17 '15 at 14:54
  • @f.rodrigues No, there is no hard limit, it just mean that it fails to build a B-spline for that data (would make sense if you have random fluctuations in the depo etc) . `s=0.0` means no smoothing. Try increasing `s` it to some fraction of `len(xy_coord)`, it should work then (see the `splprep`documentation for more details). – rth Jul 17 '15 at 15:04
0

I suspect that all the (nice but) weird plots (except the first one) are because you are trying to sort something which which doesn't allow alphabetic sorting.

There is no reason to plot each line segment separately either. Try to do just:

plt.plot(lngs, lats, 'b')

Here's a sample track I had laying around: I didn't scale the horizontal and vertical axis correctly, so the plot is elongated vertically (too high).

enter image description here

jcoppens
  • 5,306
  • 6
  • 27
  • 47
-1

place each coord into an array(lat, long). place all coords into another array collection(array1, array2, ... arrayn)

sort collection array and then plot.

jobeard
  • 129
  • 6