I am working on an XQuery library for getting simple geospatial information from GPS files (it's called GPXQuery and available at GitHub). GPX files often contain tracks of GPS coordinates, and can get quite big. My biggest test file has 20'000 points in it. GPX is very simple:
<gpx version="1.1" xmlns="http://www.topografix.com/GPX/1/1">
<trk>
<name>Berkeley Test Walk #1</name>
<trkseg>
<trkpt lon="-122.26794633083045" lat="37.878523925319314">
<ele>78.4000015258789</ele>
There is a long sequence of <trkpt>
elements, representing all the recorded GPS coordinates. I want to be able to process at least 100'000, hopefully more.
My first slightly complicated function calculates the distance of a recorded GPS track. The math does not matter here. The problem is that I run into stack issues. For my 20'000 point example, my standard Saxon setup already stops. I am sure that this could be fixed with more generous memory allocation, but I am wondering if something more fundamental may be going on.
My function should qualify for tail recursion optimization, but that's a bit hard to tell and may differ from product to product. Here's the function(s), and they are invoked by gpxquery:trk-distance($GPX/gpx:gpx)[1]
to get the distance of the first GPX track in a given GPX document $GPX
:
module namespace gpxquery = "https://github.com/dret/GPXQuery";
declare namespace xsd = "http://www.w3.org/2001/XMLSchema";
declare namespace math = "http://www.w3.org/2005/xpath-functions/math";
declare namespace gpx = "http://www.topografix.com/GPX/1/1";
declare variable $gpxquery:earth-radius := 6371000.0;
declare function gpxquery:trk-distance($gpx as element(gpx:gpx))
as xsd:float*
{
for $trk in 1 to count($gpx/gpx:trk)
return sum(gpxquery:trk-distance-recurse($gpx/gpx:trk[$trk]/gpx:trkseg/gpx:trkpt))
};
declare function gpxquery:trk-distance-recurse($trkpts as element(gpx:trkpt)*)
as xsd:float*
{
if ( count($trkpts) le 1 )
then 0
else (
gpxquery:distance-between-points($trkpts[1]/@lat, $trkpts[1]/@lon, $trkpts[2]/@lat, $trkpts[2]/@lon) ,
gpxquery:trk-distance-recurse($trkpts[position() gt 1])
)
};
declare function gpxquery:distance-between-points($lat1 as xsd:float, $lon1 as xsd:float, $lat2 as xsd:float, $lon2 as xsd:float)
as xsd:float
{
let $dlat := ($lat2 - $lat1) * math:pi() div 180
let $dlon := ($lon2 - $lon1) * math:pi() div 180
let $rlat1 := $lat1 * math:pi() div 180
let $rlat2 := $lat2 * math:pi() div 180
let $a := math:sin($dlat div 2) * math:sin($dlat div 2) + math:sin($dlon div 2) * math:sin($dlon div 2) * math:cos($rlat1) * math:cos($rlat2)
let $c := 2 * math:atan2(math:sqrt($a), math:sqrt(1-$a))
return xsd:float($c * $gpxquery:earth-radius)
};
Is there anything I should/could do differently in terms of code structure and algorithm to avoid these memory issues for bigger files? Or does this look like a reasonable approach for the general problem, and I whoever is using the library simply has to make sure that the runtime environment can deal with the requirements of deeply nested recursive calls?
Any feedback from people working with recursive functions and running into similar issues would be greatly appreciated.