0

-

Hello Everyone,

I'm trying to access data in a XML file:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://dublincore.org/documents/dcmi-    namespace/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/     http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";>
 <responseDate>2013-04-15T12:14:31Z</responseDate>
 <ListRecords>
 <record>
 <header>
 <identifier>
 a1b31ab2-9efe-11df-9922-efbb156aa6c1:01442b82-59a4-627e-800f-c63de74fc109
 </identifier>
 <datestamp>2012-08-16T14:42:52Z</datestamp>
 </header>
 <metadata>
 <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
 <dc:description>...</dc:description>
 <dc:date>1921</dc:date>
 <dc:identifier>K11510</dc:identifier>
 <dc:source>Waterschap Vallei & Eem</dc:source>
 <dc:source>...</dc:source>
 <dc:source>610</dc:source>
 <dc:coverage>Bunschoten</dc:coverage>
 <dc:coverage>Veendijk</dc:coverage>
 <dc:coverage>Spakenburg</dc:coverage>
 </oai_dc:dc>
 </metadata>
 <about>...</about>
 </record>

This a a example of the XML.

I need to access data like dc:date dc:source etc.

Anyone any ideas?

Best regards, Tim

-- UPDATE --

I'm now trying this:

foreach( $xml->ListRecords as $records )
{
foreach( $records AS $record )
{

    $data = $record->children( 'http://www.openarchives.org/OAI/2.0/oai_dc/' );

    $rows = $data->children( 'http://purl.org/dc/elements/1.1/' );

    echo $rows->date;


    break;
}

break;
}
Tim Hanssen
  • 167
  • 1
  • 1
  • 9
  • possible duplicate of http://stackoverflow.com/questions/6578832/how-to-convert-xml-into-array-in-php – mohammad mohsenipur Apr 15 '13 at 11:01
  • Where do you get stuck? Are existing querstions like [How do I parse XML containing custom namespaces using SimpleXML?](http://stackoverflow.com/q/1133897/2261774) or [How to access element like with simplexml?](http://stackoverflow.com/q/1307459/2261774) not helpful? – M8R-1jmw5r Apr 15 '13 at 11:01
  • I can access oai_dc (or I think i can) using http://www.sitepoint.com/simplexml-and-namespaces/ but i cannot access the childeren of this namespace.. – Tim Hanssen Apr 15 '13 at 11:06
  • @TimHanssen: Please show the code where you think that is. Even if it does not work, a good question shows what you've tried so far so your problem has more context (you don't need to post you whole code, just the relevant part where you get stuck). – M8R-1jmw5r Apr 15 '13 at 11:12
  • I updated the question with the code i'm using. – Tim Hanssen Apr 15 '13 at 11:16

4 Answers4

3

You have nested elements that are in different XML namespaces. In concrete you have got two additional namespaces involved:

$nsUriOaiDc = 'http://www.openarchives.org/OAI/2.0/oai_dc/';
$nsUriDc    = 'http://purl.org/dc/elements/1.1/';

The first one is for the <oai_dc:dc> element which contains the second ones * <dc:*>* elements like <dc:description> and so on. Those are the elements you're looking for.

In your code you already have a good nose how this works:

$data = $record->children( 'http://www.openarchives.org/OAI/2.0/oai_dc/' );

$rows = $data->children( 'http://purl.org/dc/elements/1.1/' );

However there is a little mistake: the $data children are not children of $record but of $record->metadata.

You also do not need to nest two foreach into each other. The code example:

$nsUriOaiDc = 'http://www.openarchives.org/OAI/2.0/oai_dc/';

$nsUriDc    = 'http://purl.org/dc/elements/1.1/';

$records = $xml->ListRecords->record;

foreach ($records as $record)
{    
    $data = $record->metadata->children($nsUriOaiDc);

    $rows = $data->children($nsUriDc);

    echo $rows->date;

    break;
}

/** output: 1921 **/

If you are running into problems like these, you can make use of $record->asXML('php://output'); to show which element(s) you are currently traversing to.

M8R-1jmw5r
  • 4,896
  • 2
  • 18
  • 26
  • I had the same problem, thanks so much for posting this solution. Saved me a great deal of time! :) – Nathan Pitman Nov 12 '14 at 14:34
  • My problem is that I have extract the namespace URLs from attributes of the header. But this is not working. I tried an example from here but that didn't help: http://php.net/manual/en/simplexmlelement.attributes.php – coder.in.me Oct 19 '16 at 14:11
0

I think this is what you're looking for. Hope it helps ;)

Julio María Meca Hansen
  • 1,303
  • 1
  • 17
  • 37
  • Hey Julio, I tried that, but I think because it's a namespace in a namespace it doenst work like that. – Tim Hanssen Apr 15 '13 at 11:04
  • @TimHanssen: No, that should not introduce you any problems. You just need to do it again - with multiple namespaces. – M8R-1jmw5r Apr 15 '13 at 11:11
  • So i tried using foreach( $xml->ListRecords as $records ) { foreach( $records AS $record ) { $data = $record->children( 'http://www.openarchives.org/OAI/2.0/oai_dc/' ); $rows = $data->children( 'http://purl.org/dc/elements/1.1/' ); echo $rows->date; break; } break; } I got the error: Warning: main(): Node no longer exists – Tim Hanssen Apr 15 '13 at 11:14
0

use DomDocument for this like access to dc:date

  $STR='
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://dublincore.org/documents/dcmi-    namespace/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/     http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";>
 <responseDate>2013-04-15T12:14:31Z</responseDate>
 <ListRecords>
 <record>
 <header> <identifier> a1b31ab2-9efe-11df-9922-efbb156aa6c1:01442b82-59a4-627e-800f-c63de74fc109 </identifier>
<datestamp>2012-08-16T14:42:52Z</datestamp>
</header>
<metadata>
 <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
  <dc:description>...</dc:description>
  <dc:date>1921</dc:date>
  <dc:identifier>K11510</dc:identifier>
  <dc:source>Waterschap Vallei & Eem</dc:source>
  <dc:source>...</dc:source>
  <dc:source>610</dc:source>
  <dc:coverage>Bunschoten</dc:coverage>
  <dc:coverage>Veendijk</dc:coverage>
  <dc:coverage>Spakenburg</dc:coverage>
 </oai_dc:dc>
</metadata>
<about>...</about>
</record>';

  $dom= new DOMDocument; 
  $STR= str_replace("&", "&amp;", $STR);  // disguise &s going IN to loadXML() 
  // $dom->substituteEntities = true;  // collapse &s going OUT to transformToXML() 
  $dom->recover = TRUE;
  @$dom->loadHTML('<?xml encoding="UTF-8">' .$STR); 
  // dirty fix
  foreach ($dom->childNodes as $item)
  if ($item->nodeType == XML_PI_NODE)
      $dom->removeChild($item); // remove hack
  $dom->encoding = 'UTF-8'; // insert proper

  print_r($doc->getElementsByTagName('dc')->item(0)->getElementsByTagName('date')->item(0)->textContent);

output:

 1921

or access to dc:source

 $source= $doc->getElementsByTagName('dc')->item(0)->getElementsByTagName('source');
 foreach($source as $value){
     echo $value->textContent."\n";
 }

output:

Waterschap Vallei & Eem
...
610

or give you array

 $array=array();
 $source= $doc->getElementsByTagName('dc')->item(0)->getElementsByTagName("*");
 foreach($source as $value){

     $array[$value->localName][]=$value->textContent."\n";


 } 
 print_r($array);

output:

 Array
(
   [description] => Array
    (
        [0] => ...

    )

   [date] => Array
    (
        [0] => 1921

    )

   [identifier] => Array
    (
        [0] => K11510

    )

   [source] => Array
    (
        [0] => Waterschap Vallei & Eem

        [1] => ...

        [2] => 610

    )

   [coverage] => Array
    (
        [0] => Bunschoten

        [1] => Veendijk

        [2] => Spakenburg

    )

)
mohammad mohsenipur
  • 3,218
  • 2
  • 17
  • 22
0

Using XPath makes dealing with namespaces more straightforward:

<?php

// load the XML into a DOM document
$doc = new DOMDocument;
$doc->load('oai-response.xml'); // or use $doc->loadXML($xml) for an XML string

// bind the DOM document to an XPath object
$xpath = new DOMXPath($doc);

// map all the XML namespaces to prefixes, for use in XPath queries
$xpath->registerNamespace('oai', 'http://www.openarchives.org/OAI/2.0/');
$xpath->registerNamespace('oai_dc', 'http://www.openarchives.org/OAI/2.0/oai_dc/');
$xpath->registerNamespace('dc', 'http://purl.org/dc/elements/1.1/');

// identify each record using an XPath query
// collect data as either strings or arrays of strings
foreach ($xpath->query('oai:ListRecords/oai:record/oai:metadata/oai_dc:dc') as $item) {
    $data = array(
        'date' => $xpath->evaluate('string(dc:date)', $item), // $item is the context for this query
        'source' => array(),
    );

    foreach ($xpath->query('dc:source', $item) as $source) {
        $data['source'][] = $source->textContent;
    }

    print_r($data);
}
Alf Eaton
  • 5,226
  • 4
  • 45
  • 50