2

I am trying to use some xpath expressions to extract info from an XML file that looks like this (it is an OAI-PMH protocol response):

<?xml version="1.0" encoding="UTF-8"?>

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
                             http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2013-10-11T09:24:55Z</responseDate>
  <request verb="ListRecords" metadataPrefix="oai_dc">http://request.url.com/oai</request>
  <ListRecords>
    <record>
      <header>
        <identifier>oai:identifier:item1</identifier>
        <datestamp>2012-06-07T12:03:53Z</datestamp>
        <setSpec>set:identifier</setSpec>
      </header>
      <metadata>
        <oai_dc:dc
            xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
                                http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title xml:lang="en-US">Title</dc:title>
          <dc:creator>creator</dc:creator>
          <dc:description xml:lang="en-US">abstract</dc:description>
          <dc:publisher xml:lang="en-US">publisher</dc:publisher>
          <dc:contributor xml:lang="en-US"></dc:contributor>
          <dc:date>2011-10-18</dc:date>
          <dc:type xml:lang="en-US"></dc:type>
          <dc:format>application/pdf</dc:format>
          <dc:identifier>identifier</dc:identifier>
          <dc:source xml:lang="en-US">source</dc:source>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
     ...
    </record>
     ...
  </ListRecords>
</OAI-PMH>

Ultimately I will have to write some Java code to do this, but I'd like to know more about the xpath and namespaces and for that reason I use the command line tool xqilla.

After some research (eg this) and many failed attempts I tried the following expression:

//*[local-name()='title']

but I get the following error:

:1:22: error: No namespace for prefix 'xs' [err:XPST0081]

Could someone point me to the right direction please? The documentation of xqilla has not proven very helpful so far.

Thanks.

Edit: Since the title is not exactly what is being asked here, a follow up (and more general) question would be how does one define namespaces in general using xqilla? Because if I try the expression:

//dc:title

the error I get is

/tmp/foo.xq:1:3: error: No namespace for prefix 'dc' [err:XPST0081]

I am running xqilla like this:

xqilla -p -i oai_response.xml foo.xq
Community
  • 1
  • 1
  • 1
    Are you using XQuery or XPath? Because in XQuery you can just generally use `declare namespace dc = "http://purl.org/dc/elements/1.1/";` – Tomalak Oct 14 '13 at 12:55
  • Interesting, but I am using XPath only. Thanks. – Panagiotis Koutsourakis Oct 14 '13 at 14:18
  • 1
    I'm sorry, then I have no idea. For XPath, namespaces have to be defined in advance, before it is interpreted. For a command line XPath interpreter I would expect a command line switch to declare namespaces. I've looked through the documentation of xquilla for a while and did not find anything like that. – Tomalak Oct 14 '13 at 14:26

1 Answers1

3

XQilla can do XPath but by default it uses XQuery.

You can create an xquery file like this, eg: my.file.xquery

declare namespace dc="http://purl.org/dc/elements/1.1/";
doc("my.file.xml")//dc:title

And then run it

xqilla my.file.xquery

If you want to use only xpath i'm not sure how to specify the namespace What you can do is use the namespace wildcard. So put this in my.file.xpath

//*:title

And run it with

xqilla -p -i my.file.xpath my.file.xml
thehpi
  • 5,683
  • 4
  • 17
  • 24