1

Provided a vector of movies' names, I would like to know their genres querying Wikidata.

Since I am a R user, I have recently discovered WikidataQueryServiceR which has exactly the same example I was looking for:

library(WikidataQueryServiceR)
query_wikidata('SELECT DISTINCT
  ?genre ?genreLabel
WHERE {
  ?film wdt:P31 wd:Q11424.
  ?film rdfs:label "The Cabin in the Woods"@en.
  ?film wdt:P136 ?genre.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}')

## 5 rows were returned by WDQS

Unfortunately, this query uses a static text, so I would like to replace The Cabin in the Woods by a vector. In order to do, I tried with the following code:

library(WikidataQueryServiceR)

example <- "The Cabin in the Woods" # Single string for testing purposes.

query_wikidata(paste('SELECT DISTINCT ?human ?humanLabel ?sex_or_gender ?sex_or_genderLabel WHERE {
  ?human wdt:P31 wd:Q5.
  ?human rdfs:label', example, '@en.
  ?human wdt:P21 ?sex_or_gender.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  OPTIONAL { ?human wdt:P2561 ?name. }
}', sep = ""))

But that does not work as expected, as I get the following result:

Error in FUN(X[[i]], ...) : Bad Request (HTTP 400).

What am I doing wrong?

Cœur
  • 37,241
  • 25
  • 195
  • 267
ccamara
  • 1,141
  • 1
  • 12
  • 32

1 Answers1

1

Have you tried to output your SPARQL query? —

  • There is no space after rdfs:label
  • There are no quotes around The Cabin in the Woods

In your R code, instead of

  ?human rdfs:label', example, '@en.

line 7 should be:

  ?human rdfs:label "', example, '"@en.

Although query_wikidata() can accept vector of strings, I'd suggest to use SPARQL 1.1 VALUES instead, in order to avoid too many requests.

library(WikidataQueryServiceR)

example <- c("John Lennon", "Paul McCartney")

values <- paste(sprintf("('%s'@en)", example), collapse=" ")

query <- paste(
  'SELECT DISTINCT ?label ?human ?humanLabel ?sexLabel {
       VALUES(?label) {', values,
      '} 
       ?human wdt:P31 wd:Q5.
       ?human rdfs:label ?label.
       ?human wdt:P21 ?sex.
       SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
   }'
)  

query_wikidata(query)

For large number of VALUES, you probably need to use the development verion of WikidataQueryServiceR: it seems that only the development version supports POST requests.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
  • Your answer works great! You're right, I forgot those quotes and spaces. I do not understand your second part of the answer regarding SPARQL values, though. – ccamara Mar 10 '18 at 08:33
  • @ccamara, SPARQL 1.1 `VALUES` is a way to provide inline data. This allows to perform a single large query instead of many small ones ([this question](https://stackoverflow.com/a/45433716/7879193) is slightly related). – Stanislav Kralin Mar 10 '18 at 08:43
  • However, if query is large, then request should be POST, not GET. WikipediaQueryServiceR [provides](https://github.com/bearloga/WikidataQueryServiceR/issues/6) support for POST requests in the development version. However, I'm still receiving `Request-URI Too Long (HTTP 414)`, when, for example, `example <- 1:1000`. Possibly Wikidata does not allow such large requests even they are POST. See https://phabricator.wikimedia.org/T112151 – Stanislav Kralin Mar 10 '18 at 08:45