0

I am using sparql in a spring boot project.

I created a query that will calculate the distance and then filter if the distance is less than x.

However, I have a very long processing time (around 2 seconds, sometimes +). On my computer I have 32GB of ram, on my production server much less (1GB of ram), but the time is always the same.

    query1
         PREFIX (...)
        PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
        PREFIX geo: <http://www.opengis.net/ont/geosparql#>
        prefix sf: <http://www.opengis.net/ont/sf>
        PREFIX uom: <http://www.opengis.net/def/uom/OGC/1.0/>
                    SELECT
            ?url ?contactUri ?email ?id ?descri_courte ?geo ?lon ?lat ?x ?geom ?lattype ?wkt ?xr ?longitude
            WHERE {
              ?url   :isLocatedAt ?place.
              ?place schema:geo ?geo.## ?place a pour coordonnées géographiques ?geo
            ?geo schema:longitude ?longitude; ## ?geo a pour longitude ?longitude
            schema:latitude ?latitude.
            BIND (STRDT(CONCAT("POINT(",str(?longitude), " ", str(?latitude), ")"),geo:wktLiteral) as ?wkt)
BIND (STRDT(CONCAT("POINT(",str(10.9999), " ", str(52.9999), ")"),geo:wktLiteral) as ?wp2)
              BIND (geof:distance(?wkt, ?wp2,uom:metre) as ?xr)
FILTER(?xr < 2000)

            ?url <https://www.datatourisme.gouv.fr/ontology/core#hasContact> ?contactUri.
              ?contactUri a <https://www.datatourisme.gouv.fr/ontology/core#Agent>.
                      optional {
                        ?contactUri schema:email ?email.
                        }
         ?url dc:identifier ?id.
             Optional { ?url rdfs:label ?descri_courte }
                 Optional { ?place schema:geo ?geo }
              Optional { ?geo schema:longitude ?lon; schema:latitude ?lat. }
              Optional {?geo rdf:type ?lattype}
              Optional {?geo <https://www.datatourisme.gouv.fr/ontology/core#latlon> ?x. }

            }

on the other hand, to make a count, the processing time is clearly faster

  query2:     
    PREFIX (...)
        PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
        PREFIX geo: <http://www.opengis.net/ont/geosparql#>
        prefix sf: <http://www.opengis.net/ont/sf>
        PREFIX uom: <http://www.opengis.net/def/uom/OGC/1.0/>
                    SELECT
                    (count(*) as ?resultat)
                    WHERE {
                      ?url   :isLocatedAt ?place.
                      ?place schema:geo ?geo.## ?place a pour coordonnées géographiques ?geo
                    ?geo schema:longitude ?longitude; ## ?geo a pour longitude ?longitude
                    schema:latitude ?latitude.
                    BIND (STRDT(CONCAT("POINT(",str(?longitude), " ", str(?latitude), ")"),geo:wktLiteral) as ?wkt)
        BIND (STRDT(CONCAT("POINT(",str(10.9999), " ", str(52.9999), ")"),geo:wktLiteral) as ?wp2)
                      BIND (geof:distance(?wkt, ?wp2,uom:metre) as ?xr)
        FILTER(?xr < 2000)
                    }

I use Jena. I have already tried to shorten query 1 but I feel the processing time does not change ... could you please help me improve the performance of my query1?

thanking you

  • 1
    really??? you deleted your old query and literally asked the same question again? Why? Editing your question would be the only way to change the query. My comments remain the same, **the geospatial index cannot be used when your create the WKT literals ad-hoc** - do you understand this? – UninformedUser Jun 06 '21 at 09:23
  • Please, try to add the WKT data via e.g. a SPARQL `INSERT` query first, then see if this improves the speed when creating the geospatial index on it. – UninformedUser Jun 06 '21 at 09:25
  • I'm also curious, why are there so may different coordinates attached to the same `?geo` object? do you need all of those? It leads to a lots of joins, but I think, in the end the serialization of the query result might be the "bottleneck". Also, in-memory or TDB? What is the size of the data, what the size of the result? Can you share the data? – UninformedUser Jun 06 '21 at 09:40
  • As I understood, to calculate the distance I need the lon and lat then to construct the Point. My geo object contains an url -> so I have to "go down" one level to get lon and lat. – quentin5799 Jun 06 '21 at 09:47
  • For the needs of my tests I use a part (around 35%) of the data otherwise the complete data is 3.30 GB. There are around 350,000 objects – quentin5799 Jun 06 '21 at 09:49
  • And then sorry I don't speak English, I use a translator and there is little documentation on the sparql geospatial – quentin5799 Jun 06 '21 at 09:55
  • you need WKT literals to use `distance` functions, that is correct. What I'm saying is, that your dataset should contain those WKT literals **before** loading such that the spatial index can be created and used to improve the speed. Either create those literals manually, or use the CLI option `--convert_geo` to convert `lat` and `long` from `http://www.w3.org/2003/01/geo/wgs84_pos#` namespace as described [here](https://jena.apache.org/documentation/geosparql/geosparql-fuseki) – UninformedUser Jun 06 '21 at 12:43
  • Ok I understand your logic, however I don't understand how to apply it. When I launch my spring boot app I do this:Model model = null; Dataset dataset = TDBFactory.createDataset(direct.replace(".rdf", "")); model = dataset.getDefaultModel() ; model.read("dataUrlAndToken", "RDF/XML"); log.info("terminé"); – quentin5799 Jun 06 '21 at 12:56
  • and then i do queries with jdbc, as mentioned in jena, MemDriver.register(); Connection conn = DriverManager.getConnection("jdbc:jena:tdb:location=" + GestionTBD.getDossier_tbd_a_utiliser() + "&must-exist=true"); java.sql.Statement stmt = conn.createStatement(); java.sql.ResultSet rset = stmt.executeQuery( String.format(COMMUNE_SEARCH2, s_lon, s_lat, s_dist) ); – quentin5799 Jun 06 '21 at 12:57
  • I don't understand how to do vis à vis the way I made myself .... – quentin5799 Jun 06 '21 at 14:55

0 Answers0