4

following freebase MQL finds 5 artists and 50 albums for each artists.

[{
  "type" : "/music/artist",
  "name":null,
  "album" : [{
    "name" : null,
    "count":null,
    "limit":50
  }],
  "limit":5
}]

first try - without a subquery

I can write SPARQL like this:

SELECT ?artist ?album
WHERE
{
    ?artist :type :/music/artist .
    ?artist :album ?album
}
LIMIT n

but, I don't know how many n should be specified because SPARQL has no hierarchy as far as I know.

second try - with a sub-query (not sure this works correctly)

Following sub-query looks like working.

SELECT ?artist ?album
WHERE
{
    ?artist :album ?album .
    {
        SELECT ?artist
        WHERE
        {
            ?artist :type :/music/artist
        }
        LIMIT k
    }
}
LIMIT n

But I don't know how to specify k, n to get 50 albums foreach 5 artists.

Some data with endpoint

Could anyone write SPARQL which print 5 artists and their 5 painting for each artists?

Below query prints artists and their paints without LIMITing result.

PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX prop:<http://dbpedia.org/property/>

SELECT ?painting ?artist
WHERE
{
    ?painting prop:artist ?artist .
    {
        SELECT ?artist
        {
            ?artist rdf:type dbpedia-owl:Artist.
        }
    }
}

Thanks.

Jason Heo
  • 9,956
  • 2
  • 36
  • 64
  • Are you asking how to get n artists, with at most k results per artist? – Joshua Taylor Feb 09 '15 at 21:22
  • If that's the case, have a look at (possible duplicate, but no answer there either): [Nested queries in sparql with limits](http://stackoverflow.com/q/21018518/1281433), and have a look at the comments on that question, including the links to questions on other sites. However, answers.semanticweb.com is down right now, so see [How to limit SPARQL solution group size?](http://goo.gl/NiI2Qm) and [SPARQL INNER LIMIT](http://goo.gl/fHXqwj) – Joshua Taylor Feb 09 '15 at 21:27
  • @JoshuaTaylor Thank you for providing good resources. Last two links helped me alot. I found that My `sub-query` is wrong (confused with inner-query and outer-query) and to do this using SPARQL is hard. Thanks. – Jason Heo Feb 10 '15 at 04:04

2 Answers2

3

Max and I had a bit of discussion in a chat, and this might end up being the same approach that Max took. I think it's a bit more readable, though. It gets 15 artists with albums, and up to 5 albums for each one. If you want to be able to include artists without any albums, you'd need to make some parts optional.

select ?artist ?album {
  #-- select 15 bands that have albums (i.e., 
  #-- such that they are the artist *of* something).
  {
    select distinct ?artist { 
      ?artist a dbpedia-owl:Band ;
              ^dbpedia-owl:artist []
    }
    limit 15
  }

  #-- grab ordered pairs (x,y) (where y > x) of their
  #-- albums.  By asking how many x's for each y, we
  #-- get just the first n y's.
  ?artist ^dbpedia-owl:artist ?album, ?album_
  filter ( ?album_ <= ?album ) 
}
group by ?artist ?album
having count(?album_) <= 5 #-- take up 5 albums for each artist
order by ?artist ?album

SPARQL results

Community
  • 1
  • 1
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • It may be as effective and seems simpler but still looks awkward... Alas there is no other way besides running two distinct queries (which may be ineffective if the join should happen on blank nodes) – Max Feb 12 '15 at 19:24
  • Tested it, the results conform to expectations and evaluation is also much faster, I'll keep this pattern in mind. Perhaps the question should be renamed to something like "SPARQL co-related (sub)queries with limits" for future reference. – Max Feb 13 '15 at 12:04
2

Based on the result you want to get, this involves some kind of nested co-related sub-query processing which is not directly feasible in a single SPARQL query (at least to my understanding, but if it is possible, I'm totally in ;) ):

Due to the bottom-up nature of SPARQL query evaluation, the subqueries are evaluated logically first, and the results are projected up to the outer query.

The second limit clause being applied after the join evaluation with the subquery, it will just limit the number of results for the outer query.

Using a LIMIT k (k=5) clause on the 2nd try's subquery will effectively return you the 5 artists you require but then limiting n to 50 would only force the album results (outer query) to a global 50 results for all these 5 artists and not a 50/artist as you would want. Turning the queries inside-out would give you a similar effect.

EDIT: A possible solution would be to build a subquery for all artists/albums and limit the subquery where to where the (somehow) ordered album count is lower than 50 (here using an album title IRI sort)

PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX prop:<http://dbpedia.org/property/>
SELECT ?artist ?outputAlbum
WHERE 
{
    {
        SELECT ?artist (MAX(str(?album1)) as ?maxedAlbum)
        WHERE {
            ?album1 prop:artist ?artist .
            ?album2 prop:artist ?artist .
            FILTER (str(?album2) < str(?album1))
        } 
        GROUP BY ?artist 
        HAVING count(?album2)<= 50
        LIMIT 5
    } 
    ?outputAlbum prop:artist ?artist .
    FILTER (str(?outputAlbum) < str(?maxedAlbum))
}

EDIT 2: last query would be the naive approach but it seems there is some inference (unknown re"gime) on the dbpedia endpoint (as shown under). A more exact query would require to have some more filters and distinct clauses -I added distinct and global count in the output to show there is still some inference somewhere):

PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX prop:<http://dbpedia.org/property/>
SELECT ?artist ?outputAlbum ?maxedCount ?inferredCrossJoinCount
WHERE 
{
    {
        SELECT ?artist (MAX(str(?album1)) as ?maxedAlbum) (count(distinct ?album2) as ?maxedCount) (count(?album2) as ?inferredCrossJoinCount)
        WHERE {
            ?artist rdf:type dbpedia-owl:Artist .
            ?album1 ?p ?artist .
            ?album2 ?p ?artist .
            FILTER (sameTerm(?p, prop:artist))
            FILTER (str(?album1) < str(?album2))
        } 
        GROUP BY ?artist 
        #HAVING count(?album2)<= 50
        LIMIT 5
    } 
    ?outputAlbum ?p ?artist .
    FILTER (sameTerm(?p, prop:artist))
    FILTER (str(?outputAlbum) < str(?maxedAlbum))
}
Max
  • 685
  • 4
  • 14
  • Thanks. Now, I can understand `Bottom-up nature` with your help. So, There are no equivalent SPARQL queries compared to above MQL. Right? SPARQL is very powerful but it's quite difficult to me ;-) – Jason Heo Feb 10 '15 at 11:53
  • I may have provided a solution (quite awkard as it is and fully yours to check) in the answer.Not sure how it would work with freebase but that would be the spirit based on current SPARQL recommendations. – Max Feb 10 '15 at 12:10
  • Thank you for having time. According to this [Q&A](http://webcache.googleusercontent.com/search?q=cache:MicS6IP9uSAJ:answers.semanticweb.com/questions/9842/how-to-limit-sparql-solution-group-size+&cd=1&hl=en&ct=clnk&gl=us) it seems like that doing with SPARQL is very difficult. [Here](https://www.freebase.com/query) you can run above MQL and can see its output. – Jason Heo Feb 10 '15 at 12:35
  • You're welcome. As it is I'm currently trying to find a way to evaluate nested co-related subqueries while working around SPARQL caveats... thus if someone has a more elegant solution I'm totally interested ;) – Max Feb 10 '15 at 12:44
  • I've editted my question. There is actual endpoint and sample query. Just fetching URI would be ok rather name (This makes query simple). I would appreciated it if you can help me. but if not that's fine. It's my job ;-). Thank you! – Jason Heo Feb 10 '15 at 12:46
  • @Max Can you explain your query a bit more? It doesn't really *limit* the number of albums to 50 per artist, but would select artists who have at most 50 albums, right? And `!sameTerm(?refAlbum)` doesn't make any sense; [sameTerm](http://www.w3.org/TR/sparql11-query/#func-sameTerm) requires two arguments. It doesn't make sense with just one. It doesn't make sense to `GROUP BY ?artists` either; since there's no `?artists` in the query. – Joshua Taylor Feb 12 '15 at 15:53
  • @inos-heo, sorry I got interrupted while testing on the dbpedia endpoint. I changed the query so it can be tested on dbpedia (@Joshua, as you mentionned the sameTerm call was not correct, along with other mistakes) – Max Feb 12 '15 at 16:40
  • @Joshua as for explanation, the subquery captures per artist the last album out of 50 max through the HAVING clause (ie having at most 50 album preceding it, sorted by URI) and limits this selection to 5 artists trhough the LIMIT clause. then the subquery is joined by artists to get all albums for the artists that precedes the last one. – Max Feb 12 '15 at 16:43
  • @Max why `FILTER (sameTerm(?p, prop:artist))`? Why not just use `?album1 prop:artist ?artist . ?album2 prop:artist ?artist`; or even better `?artist ^prop:artist ?album1, ?album2`? – Joshua Taylor Feb 12 '15 at 16:45
  • When you group by `?artist`, you've already retrieved all the albums of each `?artist`. The `having count(distinct ?album2) <= 50` clause doesn't retroactively go back and remove some `?album2` values; it removes values of `?artist` that had more than 50 distinct values for `?album2`. – Joshua Taylor Feb 12 '15 at 16:46
  • No that just what the `FILTER (str(?album2) < str(?album1))` clause is for: we perform a cross join on the albums to capture whatever album1 is preceded by !at most! 50 album2 bindings. – Max Feb 12 '15 at 16:51
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/70803/discussion-between-max-and-joshua-taylor). – Max Feb 12 '15 at 16:59
  • I posted [an answer](http://stackoverflow.com/a/28484652/1281433). I think it's the same approach, but I find it a bit more readable, and a little bit easier to follow. – Joshua Taylor Feb 12 '15 at 18:08
  • Sorry for late reply. (I live in different timezone) I didn't know that there is discussion. Thank you and @JoshuaTalyor. Though 2 answers are great, I decide to accept Joshua's answer and to award bounty 100 reputation to Max because I can't accept two answers. Thank you so much. – Jason Heo Feb 13 '15 at 01:41
  • @InoS Heo Thanks, but you may do as you wish ;) – Max Feb 13 '15 at 08:26