GPS downsampling

Question

My goal is to downsample my indata for every 100m and get the first and last line

My problem is that I get a lot fewer lines than i should when I downsample and I don't know how to get the last line.

Hope am clear enough for someone to understand

To make this
Line 20130904_0848.nmea
$GPGGA,111936.00,5849.37538,N,01739.88263,,E,2,09,00.9,00004.43,M,0024.87,M,007,0734*42
$GPGGA,111936.00,5849.37548,N,01739.88240,,E,2,09,00.9,00004.43,M,0024.87,M,007,0734*44
$GPGGA,111936.00,5849.37556,N,01739.88216,,E,2,09,00.9,00004.43,M,0024.87,M,007,0734*48
$GPGGA,111936.00,5849.37569,N,01739.88193,,E,2,09,00.9,00004.43,M,0024.87,M,007,0734*4a
$GPGGA,111936.00,5849.37581,N,01739.88171,,E,2,09,00.9,00004.43,M,0024.87,M,007,0734*40
$GPGGA,111936.00,5849.69118,N,01739.89674,,E,2,09,00.9,00004.43,M,0024.87,M,007,0734*4c
EOL 

Line 20130904_0926.nmea
$GPGGA,111936.00,5849.67569,N,01739.98426,,E,2,09,00.9,00004.43,M,0024.87,M,007,0734*45
$GPGGA,111936.00,5849.67593,N,01739.98453,,E,2,09,00.9,00004.43,M,0024.87,M,007,0734*42
$GPGGA,111936.00,5849.67616,N,01739.98479,,E,2,09,00.9,00004.43,M,0024.87,M,007,0734*44
....

Look like this

Line 20081002-1119.nmea
58.853952   13.309779   0.00
58.853907   13.310688   101.15
58.853858   13.311593   100.72
58.853811   13.312498   100.62
58.853764   13.313402   100.59
58.853752   13.313660   28.70

EOL

Line 20081002-1119.nmea
58.853952   13.309779   0.00
58.853907   13.310688   101.15
58.853858   13.311593   100.72
58.853811   13.312498   100.62
58.853764   13.313402   100.59
...

This is my code so far

from math import sin, cos, sqrt, atan2, radians

coord=[]
coord1=None
def distance(coord1,coord2): #Haversin
    lat1,lon1=coord1
    lat2,lon2=coord2
    dlat = radians(lat2-lat1)
    dlon = radians(lon2-lon1)
    a = sin(dlat/2) * sin(dlat/2)
    + cos(radians(lat1))*cos(radians(lat2))*sin(dlon/2)*sin(dlon/2)
    c = 2 *atan2(sqrt(a),sqrt(1-a))
    s = (6367*c)*1000 #meter
    return s

# with open as data will close itself after reading each line. so you don't need to close it yourself

with open('asko_nav_2013.nmea', 'r') as indata:         #making a indata and outdata, r stands for reading(readcapabilities
    with open('asko_nav_out.txt', 'w') as outdata:      #w stands for write write to a new file(open for writing-you can change things)


        while True:
            line = indata.readline()
            if not line:
                break
            if line.startswith('EOL'):  #if the line starts with EOL(end of line) it writes it in the output
                outdata.writelines("EOL")
                coord1=None
            elif line.startswith('Line'): 
                LineID=line
                outdata.writelines('\n%s' %LineID)
            elif line.startswith('$GPGGA'):  #when the fist line starts with $GPGGA it splits the columns
                data=line.split(",")        #the for loop reads the file line by line



            # Importing only coordinates from asko input file (Row 2 and 4)

                # Converting the coordinates from DDMM.MMMM to DD.DDDDDD
                LAT=(data[2])
                LAT_D=LAT[0:2]               
                LATID=float(LAT_D)

                LAT_M=LAT[2:]
                LATM=float(LAT_M)
                LATIM = float(LATM) / 60.0

                latitude=(LATID + LATIM)                  

                LON=(data[4])
                LON_D=LON[1:3]
                LONGD=float(LON_D)

                LON_M=LON[3:]
                LONM=float(LON_M)
                LONGM = float(LONM) / 60.0

                longitude=(LONGD + LONGM)

                if coord1 is None:

                # The first time through the loop "coord1" is None
                    outdata.writelines('%0.6f\t%0.6f\t%s \n'%(latitude,longitude,0))
                    coord1=(latitude,longitude)
                else:
                    coord2=(latitude,longitude)
                    dist=distance(coord1,coord2)

                    if dist <100:
                        continue
                    outdata.writelines('%0.6f\t%0.6f\t%f\n' % (latitude,longitude,dist))
                    coord1=coord2

I improved the indentation as much as possible and removed all your `>` in the beginning of the line as if you took this from a mailing list. Can you please improve the indentation further because you're the only one who knows which blocks belongs where. — Torxed, Nov 13 '15 at 10:10
Think so, I can't tell but it looks OK, your comments tho are indented one line left i think. But the most important part is the code blocks in say `if:` etc. — Torxed, Nov 13 '15 at 10:54

score 0 · Accepted Answer · answered Nov 13 '15 at 11:05

Your code can do with a little bit of reorganising to make it clearer. You need to add an additional write whenever EOL is seen for the case where the distance is under 100m:

from math import sin, cos, sqrt, atan2, radians    

def distance(coord1, coord2): #Haversin
    lat1,lon1=coord1
    lat2,lon2=coord2
    dlat = radians(lat2-lat1)
    dlon = radians(lon2-lon1)
    a = sin(dlat/2) * sin(dlat/2)
    + cos(radians(lat1))*cos(radians(lat2))*sin(dlon/2)*sin(dlon/2)
    c = 2 *atan2(sqrt(a),sqrt(1-a))
    s = (6367*c)*1000 #meter
    return s

def get_coordinates(data):
    # Importing only coordinates from asko input file (Row 2 and 4)
    # Converting the coordinates from DDMM.MMMM to DD.DDDDDD

    LAT = (data[2])
    LAT_D = LAT[0:2]               
    LATID = float(LAT_D)

    LAT_M = LAT[2:]
    LATM = float(LAT_M)
    LATIM = float(LATM) / 60.0

    latitude = (LATID + LATIM)                  

    LON = (data[4])
    LON_D = LON[1:3]
    LONGD = float(LON_D)

    LON_M = LON[3:]
    LONM = float(LON_M)
    LONGM = float(LONM) / 60.0

    longitude = (LONGD + LONGM)

    return (latitude, longitude)


coord1 = None

# with open as data will close itself after reading each line. so you don't need to close it yourself

with open('asko_nav_2013.nmea', 'r') as indata, open('asko_nav_out.txt', 'w') as outdata:
    for line in indata:
        if line.startswith('EOL'):  #if the line starts with EOL(end of line) it writes it in the output
            if dist < 100:
                outdata.write('%0.6f\t%0.6f\t%f\n' % (latitude, longitude, dist))
            outdata.write("\nEOL\n")
            coord1 = None   # Reset the first coordinate
        elif line.startswith('Line'): 
            outdata.write('\n%s' % line)
        elif line.startswith('$GPGGA'):  #when the fist line starts with $GPGGA it splits the columns
            data=line.split(",")        #the for loop reads the file line by line
            latitude, longitude = get_coordinates(data)

            if coord1:
                coord2 = (latitude, longitude)
                dist = distance(coord1, coord2)

                if dist >= 100:
                    outdata.write('%0.6f\t%0.6f\t%f\n' % (latitude, longitude, dist))
                    coord1 = coord2         
            else:
                # The first time through the loop "coord1" is None
                outdata.write('%0.6f\t%0.6f\t0.0 \n' % (latitude, longitude))
                coord1 = (latitude, longitude)

For your given input, this produces the following output file:

Line 20130904_0848.nmea
58.822923   17.664710   0.0 
58.828186   17.664946   584.888514

EOL

Line 20130904_0926.nmea
58.827928   17.666404   0.0 
58.827936   17.666413   0.870480

EOL

You also need to reset coord1 whenever EOL is detected to make sure 0 is displayed again for the first entry.

It is a bit difficult to see if this completely solves matters as your sample data does not seem to tally with your expected output.

Thanks for the reorganizing, it is well needed :D The distances are supposed to be around 100m except the first and last one. The first one should be 0(which is it) and the last should only be the last in the line. which is what I need help with to get — Cecilia Nilsson, Nov 13 '15 at 12:17
A do get the last line known and every 100m - thank you <3, but I do get to few, every 100m should give a lot more than it does — Cecilia Nilsson, Nov 13 '15 at 12:54
There could be an issue with your distance function, I've not checked the logic of that. — Martin Evans, Nov 13 '15 at 12:56

score 0 · Answer 2 · answered Nov 13 '15 at 14:43

Addressing the second issue concerning fewer result lines than expected: You are providing too little information about the nature of your problem and the input data you are processing. Sampling your input "for every 100m" could mean something different if your input data is sampled from a trajectory travelled by a moving object, especially if the motion is not purely linear.

Imagine that your input describes coordinates obtained by measuring GPS coordinates in regular intervals while moving along a circle with radius smaller than, say, 15m. Then no matter how many data points your input provides, the output for your proposed solution will never be longer than two lines, because no two points along that curve can have an absolute distance greater than 100m. This might explain why you are seeing fewer lines in the output than expected.

If you mean to sample the input at every 100m travelled, you would have to sum over all distances between input samples since the last point sampled for output and use that instead of dist. Modifying Martin's reorganised code, it could be done like this (some lines omitted for brevity):

coord1 = None
coord_last = None  # holds coordinate of last input sample
dist = 0.0         # total distance travelled since coord1
# [...]
with open('asko_nav_2013.nmea', 'r') as indata, open('asko_nav_out.txt', 'w') as outdata:
    for line in indata:
    # [...]
            if coord1:
                coord2 = (latitude, longitude)
                delta = distance(coord_last, coord2)
                dist += delta
                coord_last = coord2

                if dist >= 100:
                    outdata.write('%0.6f\t%0.6f\t%f\n' % (latitude, longitude, dist))
                    coord1 = coord2
                    dist = 0.0
            else:
                # The first time through the loop "coord1" is None
                outdata.write('%0.6f\t%0.6f\t0.0 \n' % (latitude, longitude))
                coord1 = (latitude, longitude)
                coord_last = coord1
                dist = 0.0

There is a little more data now, but not as many as i hoped but still more than before so thanks a lot :D — Cecilia Nilsson, Nov 13 '15 at 16:44
Can you elaborate a bit on why you think there should be more data points in the output? It would help a great deal if you provide more specific information on how your data was obtained (moving vehicle?), why you intend to reduce the number of samples (visualisation?) and by what metric you determine the output size to be satisfactory. — omahdi, Nov 13 '15 at 21:04
oh sure! It's from a boat, who have been mapping the seafloor. The data should be used in QGIS to make a map :) the downsampling is made because there is about 70 000 rows of data so the 200 I get after the downsampling is a bit to few — Cecilia Nilsson, Nov 13 '15 at 21:39
Thanks for clearing that up. You are not primarily interested in a time series then, in which case you should disregard my suggestion above. You probably want to look into something like [Resampling irregularly spaced data to a regular grid in Python](http://stackoverflow.com/questions/3864899/resampling-irregularly-spaced-data-to-a-regular-grid-in-python). I suppose QGIS has resampling filters, too. Simply throwing away measurements seems like quite a waste, esp. if you do not know if the "boat" followed a regular search trajectory (might help to visualise that!) — omahdi, Nov 17 '15 at 12:48

GPS downsampling

2 Answers2