1

I have a little bit strange problem. I have some internal website which is sharing something which meant to be similar to rssfeed. I mean site with XML content with some crucial information.

Simple entry (there are dozen of entries )of the XML looks like:

<?xml version='1.0' encoding='UTF-8'?>
<nvd xmlns:scap-core="http//0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:patch="http//patch/0.1" xmlns="http//obj/0.1" xmlns:lang="http//lang/2.0" xmlns:cvss="http//cvss-v2/0.2" xmlns:object="http//object/0.4" nvd_xml_version="2.0" pub_date="2014-02-25T10:00:00" xsi:schemaLocation="http//patch/0.1 http//schema/patch_0.1.xsd http//0.1 http//schema/scap-core_0.1.xsd http//obj/0.1 http//schema/nvd-cve-feed_2.0.xsd">
  <entry id="0528">
    <object:configuration id="site.com/">
      <lang:logical-test negate="false" operator="OR">
        <lang:fact-ref name="version:2.6.0"/>
        <lang:fact-ref name="version:2.6.1"/>
        <lang:fact-ref name="version:2.6.2"/>
        <lang:fact-ref name="version:2.6.3"/>
      </lang:logical-test>
    </object:configuration>
    <object:list>
      <object:product>version:2.6.3</object:product>
      <object:product>version:2.6.0</object:product>
      <object:product>version:2.6.1</object:product>
      <object:product>version:2.6.2</object:product>
    </object:list>
    <object:id>0528</object:id>
    <object:published-datetime>2014-02-17T11:55:04.787-05:00</object:published-datetime>
    <object:last-modified-datetime>2014-02-21T09:14:10.780-05:00</object:last-modified-datetime>
    <object:cwe id="264"/>
  </entry>

I would like to read this XML in order to put those values in my database. My approach is like that:

$ch = curl_init();

   if (FALSE === $ch)
       throw new Exception('failed to initialize');

curl_setopt($ch, CURLOPT_URL,"internal.adres.com");
curl_setopt($ch, CURLOPT_FRESH_CONNECT, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$content = curl_exec($ch);
$xml = new SimpleXMLElement($content);

foreach ($xml as $obj){
    var_dump($obj);
    break;
}

And here is where magic happens. When I execute var_dump($xml) i get list of objects but those objects have only id field (rest of fields like product or datetime are missing)

result of var_dump($obj) is as follows:

object(SimpleXMLElement)#3 (1) { ["@attributes"]=> array(1) { ["id"]=> string(13) "0528" } } 

How can i get all fields of this xml ?

Mithrand1r
  • 2,313
  • 9
  • 37
  • 76
  • It would be handy to have an xml document that is valid (e.g. your closing list tag is objerable-software-list above), with all the levels of nesting in them (you clearly have another level above that you're actually looping over) and with the namespace headers. It makes it much easier for people to duplicate your issue. – Rob Baillie Feb 26 '14 at 09:01
  • If you var_dump($content), what do you get - are you *sure* it's the xml you describe – Rob Baillie Feb 26 '14 at 09:44

2 Answers2

0

You are looking at attributes of <entry> field. Loop <entry> for <obj>

Justinas
  • 41,402
  • 5
  • 66
  • 96
  • Do You mean to put loop in loop like this?`foreach ($xml as $obj){ foreach ($obj as $obj1){var_dump($obj1);break; } }` I get null out there... – Mithrand1r Feb 26 '14 at 08:54
0

Simplifying the XML you supplied (removing the namespaces, correcting a closign tag and giving it a header), I have put together the following example.

It shows some of the different methods you can use to access attributes and nodes in your document.

In short:

  • Attributes can be referenced as array elements of the node.
    • E.g. $node['id']
  • Child nodes can be referenced as member variables, which can then be looped over as arrays.
    • E.g. $node->subNode or foreach( $node->subNode as $subNode )
  • You can chain references together
    • E.g. $node->subNode[0]['id']

I hope the following example makes sense with your structure...

<?

$content = '<?xml version="1.0"?>
<entries>
  <entry id="0528">
    <configuration id="google.com">
      <logical negate="false" operator="OR">
        <fact name="1.0.0"/>
      </logical>
    </configuration>
    <list>
      <product>1.0.0</product>
    </list>
    <id>0528</id>
    <datetime>2014-02-17T11:55:04.787-05:00</datetime>
    <last-modified-datetime>2014-02-21T09:14:10.780-05:00</last-modified-datetime>
  </entry>
</entries>';


$xml = new SimpleXMLElement($content);

foreach ($xml as $entry){

    // attributes can be referenced as array elements
    $entryId = $entry['id'];

    echo( "Entry id is {$entryId}\r\n" );

    // Sub-nodes can be referenced as member variables and looped over
    foreach( $entry->configuration as $configuration ) {

        $configurationId = $configuration['id'];
        echo( "Configuration id is {$configurationId}\r\n" );

        foreach( $configuration->logical as $logical ) {

            // You can string the methods together like this:
            $factName = $logical->fact[0]['name'];
            echo( "Logical fact name = $factName" );
        }

    }

}

?>

According to the updated question, the issue though seems to be related to namespaces.

You could strip those definitions out...

Whilst this probably isn't the recommended way round it (registering the namespaces is probably the way to go, but they don't appear to have valid URIs, so that may not be an option).

My example becomes:

<?

$content = '<?xml version=\'1.0\' encoding=\'UTF-8\'?>
<nvd xmlns:scap-core="http//0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:patch="http//patch/0.1" xmlns="http//obj/0.1" xmlns:lang="http//lang/2.0" xmlns:cvss="http//cvss-v2/0.2" xmlns:object="http//object/0.4" nvd_xml_version="2.0" pub_date="2014-02-25T10:00:00" xsi:schemaLocation="http//patch/0.1 http//schema/patch_0.1.xsd http//0.1 http//schema/scap-core_0.1.xsd http//obj/0.1 http//schema/nvd-cve-feed_2.0.xsd">
  <entry id="0528">
    <object:configuration id="site.com/">
      <lang:logical-test negate="false" operator="OR">
        <lang:fact-ref name="version:2.6.0"/>
        <lang:fact-ref name="version:2.6.1"/>
        <lang:fact-ref name="version:2.6.2"/>
        <lang:fact-ref name="version:2.6.3"/>
      </lang:logical-test>
    </object:configuration>
    <object:list>
      <object:product>version:2.6.3</object:product>
      <object:product>version:2.6.0</object:product>
      <object:product>version:2.6.1</object:product>
      <object:product>version:2.6.2</object:product>
    </object:list>
    <object:id>0528</object:id>
    <object:published-datetime>2014-02-17T11:55:04.787-05:00</object:published-datetime>
    <object:last-modified-datetime>2014-02-21T09:14:10.780-05:00</object:last-modified-datetime>
    <object:cwe id="264"/>
  </entry>
</nvd>';


$content = preg_replace('/xmlns[^=]*="[^"]*"/i', '', $content);

// Gets rid of all namespace references
$content = preg_replace('/[a-zA-Z]+:([a-zA-Z]+[\W=>])/', '$1', $content);

$xml = new SimpleXMLElement($content);

foreach ($xml as $entry){

    // attributes can be referenced as array elements
    $entryId = $entry['id'];


    echo( "Entry id is {$entryId}\r\n" );

    // Sub-nodes can be referenced as member variables and looped over
    foreach( $entry->configuration as $configuration ) {

        $configurationId = $configuration['id'];
        echo( "Configuration id is {$configurationId}\r\n" );

        // Note for hyphenated nodes you need to wrap in quotes and curlies
        foreach( $configuration->{'logical-test'} as $logical ) {

            $testOperator = $logical['operator'];
            echo( "Test Operator = $testOperator\r\n" );

            // You can string the methods together like this:
            $factName = $logical->{'fact-ref'}[0]['name'];
            echo( "Logical fact name = $factName\r\n" );
        }

    }

}

?>

Which outputs:

Entry id is 0528
Configuration id is site.com/
Test Operator = OR
Logical fact name = version:2.6.0

Note that in order to access the nodes with hyphens in their names you need to wrap in curlies and quotes. E.g. $logical->{'fact-ref'}

Rob Baillie
  • 3,436
  • 2
  • 20
  • 34
  • but the whole story is that in the `$entry` object is nothing more then `id` attribute.. I cannot access anything more ( even `var_dump($entry)` show only id. Result of your script is loop through `entries id` – Mithrand1r Feb 26 '14 at 09:29
  • If I replace my example xml with your version, including its namespaces, I get validation errors because the URIs in the namespaces are not valid. – Rob Baillie Feb 26 '14 at 09:43
  • If you var_dump($content), what do you get - are you *sure* it's the xml you describe – Rob Baillie Feb 26 '14 at 09:43
  • it is exacly same xml in `$content` as it is at website – Mithrand1r Feb 26 '14 at 11:28
  • If it was, it would throw XML validation errors and you'd get nothing. However, if I add a closing `nvd` tag then it kind of works. That is, I get a lot of warnings about the namespace definitions, and then the behaviour you describe. I believe that the namespaces are causing your issues. Do you have warnings switched on? – Rob Baillie Feb 26 '14 at 13:47
  • Rob thank you for Your help, I figured out that the problem is connected with `namespaces` and the only way to deal with it is using something which was described here http://www.php.net/manual/en/simplexmlelement.registerxpathnamespace.php however I dont knew how I can create relation child-parent using this.. – Mithrand1r Feb 26 '14 at 13:51
  • One option is to strip the namespace definitions out as per: http://stackoverflow.com/a/7641649/2982874. I wouldn't recommend it, but with a little tweak to the second regex it seems to work with your example. I'll update the answer... – Rob Baillie Feb 26 '14 at 13:54