Sparql, distance and very long response times

Question

I am using sparql in a spring boot project.

I created a query that will calculate the distance and then filter if the distance is less than x.

However, I have a very long processing time (around 2 seconds, sometimes +). On my computer I have 32GB of ram, on my production server much less (1GB of ram), but the time is always the same.

    query1
         PREFIX (...)
        PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
        PREFIX geo: <http://www.opengis.net/ont/geosparql#>
        prefix sf: <http://www.opengis.net/ont/sf>
        PREFIX uom: <http://www.opengis.net/def/uom/OGC/1.0/>
                    SELECT
            ?url ?contactUri ?email ?id ?descri_courte ?geo ?lon ?lat ?x ?geom ?lattype ?wkt ?xr ?longitude
            WHERE {
              ?url   :isLocatedAt ?place.
              ?place schema:geo ?geo.## ?place a pour coordonnées géographiques ?geo
            ?geo schema:longitude ?longitude; ## ?geo a pour longitude ?longitude
            schema:latitude ?latitude.
            BIND (STRDT(CONCAT("POINT(",str(?longitude), " ", str(?latitude), ")"),geo:wktLiteral) as ?wkt)
BIND (STRDT(CONCAT("POINT(",str(10.9999), " ", str(52.9999), ")"),geo:wktLiteral) as ?wp2)
              BIND (geof:distance(?wkt, ?wp2,uom:metre) as ?xr)
FILTER(?xr < 2000)

            ?url <https://www.datatourisme.gouv.fr/ontology/core#hasContact> ?contactUri.
              ?contactUri a <https://www.datatourisme.gouv.fr/ontology/core#Agent>.
                      optional {
                        ?contactUri schema:email ?email.
                        }
         ?url dc:identifier ?id.
             Optional { ?url rdfs:label ?descri_courte }
                 Optional { ?place schema:geo ?geo }
              Optional { ?geo schema:longitude ?lon; schema:latitude ?lat. }
              Optional {?geo rdf:type ?lattype}
              Optional {?geo <https://www.datatourisme.gouv.fr/ontology/core#latlon> ?x. }

            }

on the other hand, to make a count, the processing time is clearly faster

  query2:     
    PREFIX (...)
        PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
        PREFIX geo: <http://www.opengis.net/ont/geosparql#>
        prefix sf: <http://www.opengis.net/ont/sf>
        PREFIX uom: <http://www.opengis.net/def/uom/OGC/1.0/>
                    SELECT
                    (count(*) as ?resultat)
                    WHERE {
                      ?url   :isLocatedAt ?place.
                      ?place schema:geo ?geo.## ?place a pour coordonnées géographiques ?geo
                    ?geo schema:longitude ?longitude; ## ?geo a pour longitude ?longitude
                    schema:latitude ?latitude.
                    BIND (STRDT(CONCAT("POINT(",str(?longitude), " ", str(?latitude), ")"),geo:wktLiteral) as ?wkt)
        BIND (STRDT(CONCAT("POINT(",str(10.9999), " ", str(52.9999), ")"),geo:wktLiteral) as ?wp2)
                      BIND (geof:distance(?wkt, ?wp2,uom:metre) as ?xr)
        FILTER(?xr < 2000)
                    }

I use Jena. I have already tried to shorten query 1 but I feel the processing time does not change ... could you please help me improve the performance of my query1?

thanking you

really??? you deleted your old query and literally asked the same question again? Why? Editing your question would be the only way to change the query. My comments remain the same, **the geospatial index cannot be used when your create the WKT literals ad-hoc** - do you understand this? — UninformedUser, Jun 06 '21 at 09:23
Please, try to add the WKT data via e.g. a SPARQL `INSERT` query first, then see if this improves the speed when creating the geospatial index on it. — UninformedUser, Jun 06 '21 at 09:25
I'm also curious, why are there so may different coordinates attached to the same `?geo` object? do you need all of those? It leads to a lots of joins, but I think, in the end the serialization of the query result might be the "bottleneck". Also, in-memory or TDB? What is the size of the data, what the size of the result? Can you share the data? — UninformedUser, Jun 06 '21 at 09:40
As I understood, to calculate the distance I need the lon and lat then to construct the Point. My geo object contains an url -> so I have to "go down" one level to get lon and lat. — quentin5799, Jun 06 '21 at 09:47
For the needs of my tests I use a part (around 35%) of the data otherwise the complete data is 3.30 GB. There are around 350,000 objects — quentin5799, Jun 06 '21 at 09:49
And then sorry I don't speak English, I use a translator and there is little documentation on the sparql geospatial — quentin5799, Jun 06 '21 at 09:55
you need WKT literals to use `distance` functions, that is correct. What I'm saying is, that your dataset should contain those WKT literals **before** loading such that the spatial index can be created and used to improve the speed. Either create those literals manually, or use the CLI option `--convert_geo` to convert `lat` and `long` from `http://www.w3.org/2003/01/geo/wgs84_pos#` namespace as described [here](https://jena.apache.org/documentation/geosparql/geosparql-fuseki) — UninformedUser, Jun 06 '21 at 12:43
Ok I understand your logic, however I don't understand how to apply it. When I launch my spring boot app I do this:Model model = null; Dataset dataset = TDBFactory.createDataset(direct.replace(".rdf", "")); model = dataset.getDefaultModel() ; model.read("dataUrlAndToken", "RDF/XML"); log.info("terminé"); — quentin5799, Jun 06 '21 at 12:56
and then i do queries with jdbc, as mentioned in jena, MemDriver.register(); Connection conn = DriverManager.getConnection("jdbc:jena:tdb:location=" + GestionTBD.getDossier_tbd_a_utiliser() + "&must-exist=true"); java.sql.Statement stmt = conn.createStatement(); java.sql.ResultSet rset = stmt.executeQuery( String.format(COMMUNE_SEARCH2, s_lon, s_lat, s_dist) ); — quentin5799, Jun 06 '21 at 12:57
I don't understand how to do vis à vis the way I made myself .... — quentin5799, Jun 06 '21 at 14:55

Sparql, distance and very long response times

0 Answers0