First, just so you know, I haven't had a long computer science background, and started to work with web semantic this year, so I already apologie for any unprecise/non-scientific term/bad coding style I could use in this question.
Here is my task : I want to find dbpedia resources that are the closest to some labels that I have previously extracted from some documents. To that aim, I use a custom filter function (doing a Dice coefficient calculation that returns a score between 0 and 1 for example) to calculate the similarity between the DBpedia labels and the extracted expression (I am using Jena Apache).
Ex1 : extracted : "bea systems" -> closest DBpedia label : "BAE Systems Inc.", etc.
Ex2 : extracted : "harper-collins publishing company" -> closest DBpedia labels : "Harper-Collins", "HarperCollins", "HarperCollins Publishers", etc.
My problem is that I need to execute the query on a DBpdia endpoint as the dataset is huge (memory problem), but I get an http 500 error message as my function is stored locally and I'm querying a remote access endpoint...
Exception in thread "main" HttpException: 500
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.rewrap(HttpQuery.java:414)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:358)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:295)
at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:346)
at jena.example.similar.propfunction.DistanceTest.main(DistanceTest.java:48)
Here is my query code :
Node exp = NodeFactory.createLiteral("harper-collins publishing company") ;
String queryString = "" +
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> " +
"PREFIX fn: <java:jena.example.similar.propfunction.> " +
"PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> " +
"SELECT ?company ?label ?funcRes " +
"WHERE {" +
"?company a dbpedia-owl:Company . " +
"?company rdfs:label ?label . " +
"BIND (fn:DiceCoeff(?label, "+exp+") as ?funcRes) " +
"FILTER (lang(?label) = \"en\")" +
"}" +
"ORDER BY DESC(?funcRes) " +
"LIMIT 10 " ;
Query query = QueryFactory.create(queryString) ;
// execute the query
QueryExecution qexec = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query);
try {
ResultSet results = qexec.execSelect() ;
ResultSetFormatter.out(System.out, results, query) ;
} finally { qexec.close() ; }
The filter function I'm using works fine, I tested it with the same kind of query (ie. using the BIND and ORDER BY) on another smaller dataset (not DBpdia) accessed locally, and it gave me the expected results.
So, is there a way to use the custom filter function on a remote endpoint or not at all? Otherwise, what are the other options to the task I'm doing? (I've read the discussion in How I can write SPARQL query that uses similarity measures in Java Code, but it doesn't seem to be the best for me)
I would appreciate any suggestions from the community :)