1

I want to make a Name Entity Recognizer using wikipedia Data, I need to get all the super classes of a word to see in which category (Place, Human, Organization or None) the word is. I surfed the Internet a lot and find some pages like :

which when I execute the query results "No matching records found" even with the word mentioned in the page and trying other namespaces. and:

which is very similar to my work, but I get the "No matching records found" result too.

I think the queries mentioned in these links are logically correct, but I have no idea why they results nothing for me. I also tried to learn SPARQL by examples mentioned in these sites :

and I didn't find anything for finding super classes of a word.

There are some examples of the codes which I didn't get result:

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX ns:<http://dbpedia.org/>

SELECT ?subClass ?label WHERE { 
    ?subClass rdfs:subClassOf ns:Albert . 
    ?subClass rdfs:label ?label . }

or:

SELECT * WHERE {
  dbpedia:Albert a ?c1 ; a ?c2 .
  ?c1 rdfs:subClassOf ?c2 .
}
vvvvv
  • 25,404
  • 19
  • 49
  • 81
Mina smz
  • 55
  • 1
  • 7

3 Answers3

1
  1. Who is "Albert"?! You can only query for data that does exist in DBpedia. There is no resource http://dbpedia.org/resource/Albert

  2. Your first query uses a wrong namespace, at least I've never seen http://dbedia.org as namespace, for resources it's usually http://dbpedia.org/resource/

  3. Your first query uses the rdfs:subClassOf predicate wrong for the case that "Albert" is supposed to be a resource. Expressing that a resource ":x" belongs to a class :C is done by the RDF triple :x a :C .. And the class :C has a superclass :D is denoted in RDF by :C rdfs:subClassOf :D ..

  4. Your second query again uses some old namespace prefix dbpedia:, which is now called dbr: and does exactly represent the namespace http://dbpedia.org/resource/. But as I mentioned in my first point, there is no resource for "Albert"

  5. What is the "superclass of a word"? Just to clarify, resources belong to a class, and a class can have superclasses.

If you want to have all classes including their superclasses a resource belongs to, you can use e.g. for "Tom Hanks"

PREFIX dbr: <http://dbpedia.org/resource/>
SELECT DISTINCT ?c WHERE {
  dbr:Tom_Hanks a/rdfs:subClassOf* ?c .
} 
UninformedUser
  • 8,397
  • 1
  • 14
  • 23
  • Thank you @AKSW so much for your answer, yes "Albert" doesn't exist. I just wanted to make an example. I get result from your query by DBpedia query service, but it doesn't work on wikidata query service. The reason I want it to work on wikidata is that it can get items id by wd and wdt prefixes. I didn't find such functionality on dbpedia query service. – Mina smz May 04 '17 at 06:52
  • @AKSW thanks a lot for the great answer. When I change your above code to `dbr:Word2vec` I get very confusing results (e.g., 'band', 'musical group', etc.). However, word2vec is clearly a computer science technique. Therefore, I was expecting superclasses such as `technique`,`algorithm`, `computer science` etc. Just wondering about the actual results I got. Looking forward to hearing from you :) – EmJ Jun 27 '19 at 03:44
  • @AKSW I have mentioned below the actual results I got using dbr:Word2vec `c http://dbpedia.org/ontology/Band http://schema.org/MusicGroup http://dbpedia.org/ontology/Group http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#SocialPerson http://schema.org/Organization http://dbpedia.org/ontology/Organisation http://dbpedia.org/ontology/Agent http://www.w3.org/2002/07/owl#Thing` – EmJ Jun 27 '19 at 03:46
  • 1
    @Emi well, we both now that this is wrong, but it's just what's in the DBpedia dataset. The query is correct, it's just yet another data quality issue. Look at the resource page in the browser: http://dbpedia.org/page/Word2vec - it has type `dbo:Band`. Look at the Wikipedia page, there is no infobox on the right side which is more or less the main source for DBpedia - it hast just this machine learning overview which in fact is not a page specific infobox - thus, DBpedia has to infer the type from something else - this was done in a research project, the dataset is `instance_types_sdtyped_dbo` – UninformedUser Jun 27 '19 at 06:40
  • 1
    @Emi Again, you should contact the DBpedia developers/maintainers on their mailing list or just open a Github issue on the DBpedia project page: https://github.com/dbpedia/extraction-framework/issues – UninformedUser Jun 27 '19 at 06:40
  • ok, I check both instance types dataset from the older donwloads page: https://wiki.dbpedia.org/downloads-2016-10#datasets as well as from the new Databus page: https://databus.dbpedia.org/marvin/mappings/instance-types/2019.06.01 - I could not find the triple. Please ask the DBpedia people, it's weird to have such a type triple that comes from nowhere – UninformedUser Jun 27 '19 at 07:01
1

So the subClassOf predicate only applies to classes of things not instances generally. You need to connect with the class via rdf:type.

SELECT * WHERE {
  <http://dbpedia.org/resource/Albert_Einstein> a ?c1 ; a ?c2 .
  ?c1 rdfs:subClassOf ?c2 .
}

I am not sure what type of entities you can get from Albert, it probably requires disambiguation. My example queries are using Albert Einstein as the DBPEDIA resource.

Bear in mind that you there could multiple hops to the root class depending the level of abstraction that you are interested. This second query goes up two levels.

SELECT DISTINCT ?c3 WHERE {
  <http://dbpedia.org/resource/Albert_Einstein> a ?c1 ; a ?c2 .
  ?c1 rdfs:subClassOf ?c2 .
  ?c2 rdfs:subClassOf ?c3 .
}
Manuel Salvadores
  • 16,287
  • 5
  • 37
  • 56
  • Thank you @msalvadores for your helpful answer. your query works on DBpedia query service, but I need it to work on wikidata query service which doesn't work. – Mina smz May 04 '17 at 06:56
1

Probably, you are looking for something like this query:

SELECT DISTINCT ?c WHERE {
  ?Q wdt:P31/wdt:P279? ?c .
  ?Q rdfs:label "Tom Hanks"@en
} 

Wikidata uses its own predicates instead of rdf:type and rdfs:subClassOf (wdt:P31 and wdt:P279 respectively).

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58