3

I'm trying to get all distinct labels of a Wikidata Item.

I know I can get all labels of an item with the following query:

SELECT DISTINCT ?team ?labels WHERE {
  ?team wdt:P31 wd:Q13393265.
  ?team rdfs:label ?labels.
}
LIMIT 10

Link to query

But how would I go about only getting distinct labels (so no duplicates)?

I've tried the following:

SELECT DISTINCT ?team ?labels WHERE {
  ?team wdt:P31 wd:Q13393265.
  {
    SELECT DISTINCT * WHERE 
    {
      ?team rdfs:label ?labels.
    }
  }
}
LIMIT 10

Link to query

But the results still contain duplicate labels.

PS: limits are only set, so queries are fast while debugging. Once it works as intended, there will not be a limit

Ken White
  • 123,280
  • 14
  • 225
  • 444
Hein
  • 174
  • 1
  • 10
  • 4
    what do you mean by "duplicates" - yes, some languages will have the same **lexical form** aka string. If you want to get rid of duplicate raw strings, then `SELECT DISTINCT ?team (str(?labels) as ?label) WHERE {` is the way to go – UninformedUser Dec 16 '22 at 19:26
  • @UninformedUser thank you very much, that did it! I assumed `rdfs:label` would only return strings, so I didn't think i'd have to convert them, that was the missing info i needed. If you post it as a seperate response I can mark this question as solved. – Hein Dec 16 '22 at 19:57

1 Answers1

2

Your first query actually does show distinct labels, but the way Wikidata displays the results doesn’t make it clear.

These four literals are different, but Wikidata will display "foobar" every time:

"foobar"
"foobar"@en
"foobar"@en-US
"foobar"@es

To display the language, you can use lang():

SELECT DISTINCT ?team ?label (lang(?label) AS ?language) 
WHERE {
  ?team wdt:P31 wd:Q13393265.
  ?team rdfs:label ?label.
}
LIMIT 10

To ignore the language, you can use str(), which returns the lexical form:

SELECT DISTINCT ?team (str(?label) AS ?label_lexical)
WHERE {
  ?team wdt:P31 wd:Q13393265.
  ?team rdfs:label ?label.
}
LIMIT 10