22

I have an XML document that looks like this:

<Data 
  xmlns="http://www.domain.com/schema/data" 
  xmlns:dmd="http://www.domain.com/schema/data-metadata"
>
  <Something>...</Something>
</Data>

I am parsing the information using SimpleXML in PHP. I am dealing with arrays and I seem to be having a problem with the namespace.

My question is: How do I remove those namespaces? I read the data from an XML file.

Thank you!

Tomalak
  • 332,285
  • 67
  • 532
  • 628
jchimpo
  • 245
  • 1
  • 3
  • 7
  • If you'd like details... my original question was posted here, which a user already answered (Thanks!). But I found out that the namespace is causing his loops not to run and return an empty array. The original question located here: http://stackoverflow.com/questions/1209301/php-simplexml-group-by-element-type – jchimpo Aug 07 '09 at 17:08

5 Answers5

21

I found the answer above to be helpful, but it didn't quite work for me. This ended up working better:

// Gets rid of all namespace definitions 
$xml_string = preg_replace('/xmlns[^=]*="[^"]*"/i', '', $xml_string);

// Gets rid of all namespace references
$xml_string = preg_replace('/[a-zA-Z]+:([a-zA-Z]+[=>])/', '$1', $xml_string);
Chris Lawrence
  • 219
  • 2
  • 2
  • 4
    I'd get rid of "all namespace references" with something like this: $xml = preg_replace('/(<\/*)[^>:]+:/', '$1', $xml); – Silas Palmer Aug 28 '14 at 01:54
  • One of the few times in my life I've upvoted a solution to manipulate XML with regex. I really don't want to register a default namespace and needlessly clutter up my xpath queries. – But those new buttons though.. Nov 28 '18 at 19:39
  • Almost perfect. Needs to look for a potential space after the node name. Strips node content if it has a colon `Order:Num`, also doesn't find numeric keys `Content`. Try: `$xml_string = preg_replace('/(<\/|<)[a-zA-Z]+:([a-zA-Z0-9]+[ =>])/', '$1$2', $xml_string);` – M P Aug 07 '19 at 00:15
20

If you're using XPath then it's a limitation with XPath and not PHP look at this explanation on xpath and default namespaces for more info.

More specifically its the xmlns="" attribute in the root node which is causing the problem. This means that you'll need to register the namespace then use a QName thereafter to refer to elements.

$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
$feed->registerXPathNamespace("a", "http://www.domain.com/schema/data");
$result = $feed->xpath("a:Data/a:Something/...");

Important: The URI used in the registerXPathNamespace call must be identical to the one that is used in the actual XML file.

Alex
  • 32,506
  • 16
  • 106
  • 171
null
  • 7,432
  • 4
  • 26
  • 28
2

The following PHP code automatically detects the default namespace specified in the XML file under the alias "default". No all xpath queries have to be updated to include the prefix default:

So if you want to read XML files rather they contain an default NS definition or they don't and you want to query all Something elements, you could use the following code:

$xml = simplexml_load_file($name);
$namespaces = $xml->getDocNamespaces();
if (isset($namespaces[''])) {
    $defaultNamespaceUrl = $namespaces[''];
    $xml->registerXPathNamespace('default', $defaultNamespaceUrl);
    $nsprefix = 'default:';
} else {
    $nsprefix = '';
}

$somethings = $xml->xpath('//'.$nsprefix.'Something');

echo count($somethings).' times found';
Alex
  • 32,506
  • 16
  • 106
  • 171
2

When you just want your xml, parsed to be used, and you don't care for any namespaces, you just remove them. Regular expressions are good, and way faster than my method below.

But for a safer approach when removing namespaces, one could parse the xml with SimpleXML and ask for the namespaces it has, like below:

$xml = '...';
$namespaces = simplexml_load_string($xml)->getDocNamespaces(true);
//The line bellow fetches default namespace with empty key, like this: '' => 'url'
//So we remove any default namespace from the array
$namespaces = array_filter(array_keys($namespaces), function($k){return !empty($k);});
$namespaces = array_map(function($ns){return "$ns:";}, $namespaces);

$ns_clean_xml = str_replace("xmlns=", "ns=", $xml);
$ns_clean_xml = str_replace($namespaces, array_fill(0, count($namespaces), ''), $ns_clean_xml);
$xml_obj = simplexml_load_string($ns_clean_xml);

Thus you hit replace only for the namespaces avoiding to remove anything else the xml could have.

Actually I am using it as a method:

function refined_simplexml_load_string($xml_string) {
  if(false === ($x1 = simplexml_load_string($xml_string)) ) return false;
  
  $namespaces = array_keys($x1->getDocNamespaces(true));
  $namespaces = array_filter($namespaces, function($k){return !empty($k);});
  $namespaces = array_map(function($ns){return "$ns:";}, $namespaces);
  
  return simplexml_load_string($ns_clean_xml = str_replace(
    array_merge(["xmlns="], $namespaces),
    array_merge(["ns="], array_fill(0, count($namespaces), '')),
    $xml_string
  ));
}
  • Thanks a lot for sharing your solution. I had some other method for doing this (PHP 7.2), and it was serving me well for years. However, for some weird reason it wasn't really doing any kind of cleanup in PHP 8.1. I could not find anything relevant between releases, but your method works with both PHP versions – Oliver Maksimovic Aug 05 '23 at 19:10
0

To remove the namespace completely, you'll need to use Regular Expressions (RegEx). For example:

$feed = file_get_contents("http://www.sitepoint.com/recent.rdf");
$feed = preg_replace("/<.*(xmlns *= *[\"'].[^\"']*[\"']).[^>]*>/i", "", $feed); // This removes ALL default namespaces.
$xml_feed = simplexml_load_string($feed);

Then you've stripped any xml namespaces before you load the XML (be careful with the regex through, because if you have any fields with something like:

<![CDATA[ <Transfer xmlns="http://redeux.example.com">cool.</Transfer> ]]>

Then it will strip the xmlns from inside the CDATA which may lead to unexpected results.

n00dle
  • 5,949
  • 2
  • 35
  • 48
null
  • 7,432
  • 4
  • 26
  • 28