Serializing models and resolving references with spira/RDF.rb

Question

I'm using Spira as a model/persistence layer for a Ruby application. I'm having trouble getting a suitable serialization (e.g., as RDF/XML) for my individual models. For example, when I dump a model that contains "associations", I get XML that looks like:

<ns0:video rdf:about="info:whatever/videos/g91832990">
  <ns1:contributor rdf:resource="info:whatever/interviewees/g88129610"/>
  <ns1:title>Test Video</ns1:title>
  <ns0:files rdf:resource="info:whatever/files/g91776800"/>
</ns0:video>

However, I'd like this XML representation to resolve the rdf:resource references. That is, I'd like the XML to look more like this (which is what I get when I do a dump of the whole repository/triplestore):

<ns0:video rdf:about="info:repository/videos/g91832990">
  <ns1:contributor>
    <ns2:person rdf:about="info:repository/interviewees/g88129610">
      <ns2:name>Creator</ns2:name>
    </ns2:person>
  </ns1:contributor>
  <ns1:title>Test Video</ns1:title> <!-- ... -->
</ns0:video>

The contributor element is expanded to contain the relevant metadata. I can get the first-level references with a SPARQL query like:

sparql.construct([:o, :p2, :o2]).where([node, :p, :o], [:o, :p2, :o2])

where node is my "about" node. However, I want to do this to arbitrary depth. I understand that this question might touch on bigger issues, like doing recursive queries in SPARQL/RDF. However, I was hoping there would be some switch or setting in Spira or RDF.rb that would just change the output format.

Sorry about my terminology: I'm sure "resolving references" isn't the correct term to use.

EDIT

In Spira, models mixin RDF::Enumerable; they have an RDF representation comprising RDF statements from the triplestore where the subject is the model's URI. "Dumping a model" looks like:

v = Video.find 'RDF::Enumerable'
v.dump(:rdfxml)

The RDF/XML generated contains only the model's RDF statements. It's also possible to dump the whole triplestore (e.g., my second example above) with the following command:

Spira.repository.dump(:rdfxml)

When you say that you "dump" the model, is it correct to assume that somewhere else in the output, there's a description of `info:whatever/interviewees/g88129610`? — Joshua Taylor, Nov 25 '13 at 20:38
@JoshuaTaylor, thanks for the response. If I'm understanding you correctly, the answer is no--the RDF/XML generated from dumping the model only contains statements where the subject is the model URI. However, the description for `info:whatever/interviewees/g88129610` does exist in the triplestore, and can be output with a separate query or a dump of the whole triplestore. I've added a little bit of detail above. — Jacob Brown, Nov 25 '13 at 20:55
Ah, if the `dump` function there just returns a model containing triples whose subjects are the resource at hand, then you're right, that won't contain the descriptions of the other other resources that are related to it. It sounds like you might need to write something that does that for you. :\ — Joshua Taylor, Nov 25 '13 at 21:13
One thing occurs to me: you're using a `construct` query to get the data that you want, but you're worried that it doesn't grab enough, so you'd have to do recursive queries, or something like that. Here's a question: what results do you get from a `describe` query? I don't know what the default handler does, but maybe it's helpful, or perhaps it can be customized? — Joshua Taylor, Nov 25 '13 at 21:52
That's a great idea: `describe ?x { ?p ?x }` gets me very close to where I want to be and I'm trying other configurations... Thanks for the suggestion. — Jacob Brown, Nov 25 '13 at 22:26

score 1 · Accepted Answer · edited May 23 '17 at 11:56

There are two parts to this answer. The first is that the particular structure of the XML used in the RDF/XML serialization doesn't matter (insofar as the RDF data is concerned; you're still free to have a preference about what it looks like). The second part is about getting what you want (for aesthetic reasons) out of RDF.rb.

The particular XML structure of RDF/XML doesn't matter

RDF is a graph based data representation. The basic piece of information in RDF is the triple, also called a statement, which has the form

subject predicate object

A whole bunch of those make up an RDF graph. Those RDF graphs can be serialized in a number of formats. Some are easy to read and write by hand, and others are more complex. Some serialization formats might have a single way of writing a given RDF graph, or define a canonical way, but most will given you a number of different ways to write the same RDF graph.

For instance, the following data (in Turtle):

@prefix : <http://example.org/> .

<info:repository/videos/g91832990>
  a :video ;
  :contributor <info:repository/interviewees/g88129610> ;
  :title "Test Video" .

<info:repository/interviewees/g88129610>
  a :person ;
  :name "Creator" .

can be serialized in RDF/XML in different ways, because the format allows for lots of shorthand notation. For instance, with Jena, if I serialize as (plain) RDF/XML, I get:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns="http://example.org/" > 
  <rdf:Description rdf:about="info:repository/videos/g91832990">
    <rdf:type rdf:resource="http://example.org/video"/>
    <contributor rdf:resource="info:repository/interviewees/g88129610"/>
    <title>Test Video</title>
  </rdf:Description>
  <rdf:Description rdf:about="info:repository/interviewees/g88129610">
    <rdf:type rdf:resource="http://example.org/person"/>
    <name>Creator</name>
  </rdf:Description>
</rdf:RDF>

but if I serialize as RDF/XML-ABBREV, I get:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns="http://example.org/">
  <video rdf:about="info:repository/videos/g91832990">
    <contributor>
      <person rdf:about="info:repository/interviewees/g88129610">
        <name>Creator</name>
      </person>
    </contributor>
    <title>Test Video</title>
  </video>
</rdf:RDF>

Those are the same RDF graph. The latter might be a bit more expensive to write, since it uses more abbreviations, but they are the same RDF graph.

However, I'd like this XML representation to resolve the rdf:resource references. That is, I'd like the XML to look more like this (which is what I get when I do a dump of the whole repository/triplestore):
<ns0:video rdf:about="info:repository/videos/g91832990">
  <ns1:contributor>
    <ns2:person rdf:about="info:repository/interviewees/g88129610">
      <ns2:name>Creator</ns2:name>
    </ns2:person>
  </ns1:contributor>
  <ns1:title>Test Video</ns1:title> 
</ns0:video>

It's OK to have aesthetic preferences, as long as you recognize that dumping the model in one format versus another doesn't change what graph you're getting. The structure of the serialization won't affect the results of your SPARQL queries, since the SPARQL query is based on the RDF graph, not the serialization. In fact, trying to access RDF by using XML tools and the RDF/XML serialization is really a bad idea, as I've discussed in this answer to How to access OWL documents using XPath in Java?.

Getting abbreviated RDF/XML with RDF.rb

According to its website, RDF.rb supports a number of serialization formats (emphasis added):

RDF::NTriples
RDF::JSON (plugin)
RDF::N3 (plugin)
RDF::Raptor::RDFXML (plugin)
RDF::Raptor::Turtle (plugin)
RDF::RDFa (plugin)
RDF::RDFXML (plugin)
RDF::Trix (plugin)

Note that there are two for RDFXML there, one through Raptor, and one from RDF.rb. At least one of those should provide support for the more concise bits of RDF/XML. I haven't used RDF.rb lately, but I seem to recall that the Raptor libraries provide a number of options, so that might be a good bet here. The built in one might have something too, of course.

If you start digging around in the source for rdf-rdfxml, you'll find in the writer, that there's an initialization option that might help you out here:

# @option options [Integer]  :max_depth (3)
#   Maximum depth for recursively defining resources

Wow, thanks for the detailed response. I'm getting the raptor libs to see if they might help, but I think you're right in your comment above, that I'll have to write my own RDF loader. The reason I ask this question is that I want to have a nice RDF/XML "chunk" (not the whole store), that I can then style with a bit of XSLT into a nice XML representation of my resource. — Jacob Brown, Nov 25 '13 at 21:33
I did notice the `:max_depth` option, but I don't think it does what I want it to do: e.g., `RDF::RDFXML::Writer.buffer(:max_depth => 30) { |w| v.each_statement { |s| w << s } }` doesn't change anything. — Jacob Brown, Nov 25 '13 at 21:35
@kardeiz Well, as you've described it, the model you're getting from `dump` doesn't have any depth to begin with. If you try setting it to, e.g., `0`, with the dump of the triplestore, do you see a difference? — Joshua Taylor, Nov 25 '13 at 21:39
You're right--when I do a dump of the triplestore with `:max_depth => 0`, I get a "flattened" representation of the store. That is, for example, the section about `info:whatever/videos/g91832990` looks like my first example above, rather than my second example (which is how it looks with the default `:max_depth => 3`). — Jacob Brown, Nov 25 '13 at 21:45
@kardeiz OK, you're now at the point where you can write a SPARQL construct query that will get you the model that you want, and you can specify an arbitrary deep `max_depth` to ensure that you'll get the kind of output you want. — Joshua Taylor, Nov 25 '13 at 21:51

Serializing models and resolving references with spira/RDF.rb

1 Answers1

The particular XML structure of RDF/XML doesn't matter

Getting abbreviated RDF/XML with RDF.rb