1

I can't seem to get this basic xslt query working via xmlstarlet.

I'm sure I'm missing something obvious, but for the life of me I cannot figure out this syntax, so someone please illuminate me.

XML Starlet Command:

xml sel -t -m "//rdf:RDF/item" -v link -v description -v link ./sss.rdf

sss.rdf:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:admin="http://webns.net/mvcb/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:ev="http://purl.org/rss/1.0/modules/event/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/">
    <channel rdf:about="http://baltimore.craigslist.org/search/sss?catAbb=sss&amp;amp;format=rss&amp;amp;maxAsk=150&amp;amp;minAsk=50&amp;amp;query=ipod%20touch%205g&amp;amp;srchType=A">
        <title>craigslist baltimore | all for sale / wanted search "ipod touch 5g"</title>
        <link>http://baltimore.craigslist.org/search/sss?catAbb=sss&amp;amp;maxAsk=150&amp;amp;minAsk=50&amp;amp;query=ipod%20touch%205g&amp;amp;srchType=A</link>
        <description />
        <dc:language>en-us</dc:language>
        <dc:rights>&amp;copy; 2013 craigslist</dc:rights>
        <dc:publisher>robot@craigslist.org</dc:publisher>
        <dc:creator>robot@craigslist.org</dc:creator>
        <dc:source>http://baltimore.craigslist.org/search/sss?catAbb=sss&amp;amp;format=rss&amp;amp;maxAsk=150&amp;amp;minAsk=50&amp;amp;query=ipod%20touch%205g&amp;amp;srchType=A</dc:source>
        <dc:title>craigslist baltimore | all for sale / wanted search "ipod touch 5g"</dc:title>
        <dc:type>Collection</dc:type>
        <syn:updateBase>2013-09-20T09:23:41-07:00</syn:updateBase>
        <syn:updateFrequency>1</syn:updateFrequency>
        <syn:updatePeriod>hourly</syn:updatePeriod>
        <items>
            <rdf:Seq>
                <rdf:li rdf:resource="http://baltimore.craigslist.org/ele/4039527375.html" />
            </rdf:Seq>
        </items>
    </channel>
    <item rdf:about="http://baltimore.craigslist.org/ele/4039527375.html">
        <title><![CDATA[Unlocked Optimus Lg Phone (Baltimore) $150]]></title>
        <link>http://baltimore.craigslist.org/ele/4039527375.html</link>
        <description>OR WE CAN HAVE A SWAP FOR AN IPOD TOUCH 5g<![CDATA[
&#9679;Optimus Lg Phone For Sale At 150.00 The Original Price Was $180.00
&#9679;It Does Not Include The Charger, But You Can Find It At Walmart For $4.00
&#9679;The Phone Was Only Used For 2-3 Months
&# [...]]]></description>
        <dc:date>2013-09-01T10:14:06-07:00</dc:date>
        <dc:language>en-us</dc:language>
        <dc:rights>&amp;copy; 2013 craigslist</dc:rights>
        <dc:source>http://baltimore.craigslist.org/ele/4039527375.html</dc:source>
        <dc:title><![CDATA[Unlocked Optimus Lg Phone (Baltimore) $150]]></dc:title>
        <dc:type>text</dc:type>
        <dcterms:issued>2013-09-01T10:14:06-07:00</dcterms:issued>
    </item>
</rdf:RDF>

My Desired Output:

Unlocked Optimus Lg Phone (Baltimore) $150
OR WE CAN HAVE A SWAP FOR AN IPOD TOUCH 5g
    &#9679;Optimus Lg Phone For Sale At 150.00 The Original Price Was $180.00
    &#9679;It Does Not Include The Charger, But You Can Find It At Walmart For $4.00
    &#9679;The Phone Was Only Used For 2-3 Months
    &# [...]
http://baltimore.craigslist.org/ele/4039527375.html
  • 1
    You really shouldn't try to manipulate or query RDF using XML-based tools, since there are lots of possible RDF/XML serialization of the same RDF graph. See [this answer](http://stackoverflow.com/a/17052385/1281433) for more information on why. In this case, consider [this alternative serialization](http://pastebin.com/eeQFkm35) of your RDF. It's the _same_ RDF graph, and it's XML, but your query wouldn't work on it (nor would the query kjhughes's answer). – Joshua Taylor Sep 23 '13 at 15:26
  • @JoshuaTaylor is entirely correct. Unless you're scraping fixed RDF or it somehow makes sense to assume the structure won't be changing, avoid XPath'ing over RDF if you want your solution to be at all robust. – kjhughes Sep 23 '13 at 16:09

1 Answers1

1

This XmlStarlet command:

xml sel -N purl="http://purl.org/rss/1.0/" -t -m "//rdf:RDF/purl:item" -v purl:title -n -v purl:description -n -v purl:link -n ./sss.rdf

Yields the desired output:

Unlocked Optimus Lg Phone (Baltimore) $150
OR WE CAN HAVE A SWAP FOR AN IPOD TOUCH 5g
&amp;#9679;Optimus Lg Phone For Sale At 150.00 The Original Price Was $180.00
&amp;#9679;It Does Not Include The Charger, But You Can Find It At Walmart For $4.00
&amp;#9679;The Phone Was Only Used For 2-3 Months
&amp;# [...]
http://baltimore.craigslist.org/ele/4039527375.html

Explanation:

The key is to notice that the input document has a default namespace, which causes item, title, description, and link to be in the http://purl.org/rss/1.0/ namespace. Defining -N purl="http://purl.org/rss/1.0/" allows us to use the purl prefix when specifying these elements in the XPaths. Without the purl namespace prefixes, the XPaths weren't matching.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • 1
    Note that this works for this particular RDF/XML serialization of the RDF graph, but note that it wouldn't work on [this RDF/XML serialization of the _same_ graph](http://pastebin.com/eeQFkm35). In general, XPath can't be used reliably to query RDF. This topic was discussed in the answers to [this question](http://stackoverflow.com/q/17036871/1281433). (That's not a slight against this answer, though; the OP asked for a XML based answer, and this is a good one.) – Joshua Taylor Sep 23 '13 at 15:31
  • @JoshuaTaylor, agreed. There are times when XPath will suffice such as for quick scrapes/pulls from fixed RDF, but you're right to point out the robustness issue in general. +1 – kjhughes Sep 23 '13 at 15:59
  • Thank you so much! I'm still not entirely sure I understand, but if you have some good XMLStarlet tutorials that are readable for beginners, I'd really appreciate it! –  Sep 23 '13 at 18:12
  • @Lani, it's not a question of XMLStarlet (unless you meant that you didn't understand something about kjhughes answer), but of RDF. The data you've shown in an RDF graph, serialized using an XML format. A given RDF graph can be serialized in lots of different ways, but all still representing the same RDF graph. The differences can be significant, and an XPath query that works on one serialization might not work on another, even though they represent the same RDF graph. I was just pointing out that this solution might stop working if you end up with a different serialization of the same data. – Joshua Taylor Sep 23 '13 at 18:28