1

I am trying to develop a function that removes certain URL nodes from my sitemap file. Here is what I have so far.

$xpath = new DOMXpath($DOMfile);
$elements = $xpath->query("/urlset/url/loc[contains(.,'$pageUrl')]");
echo count($elements);
foreach($elements as $element){
    //this is where I want to delete the URL
    echo $element;
    echo "here".$element->nodeValue;
}

Which outputs "111111". I don't know why I can't echo a string in a foreach loop if the $elements count is '1'.

Up until now, I've been doing

$urls = $dom->getElementsByTagName( "url" );
foreach( $urls as $url ){
    $locs = $url->getElementsByTagName( "loc" );
    $loc = $locs->item(0)->nodeValue;
    echo $loc;
    if($loc == $fullPageUrl){
                   $removeUrl = $dom->removeChild($url);                
    }
}

Which would work fine if my sitemap wasn't so big. It times out right now, so I'm hoping using xpath queries will be faster.

After Gordon's comment, I tried:

$xpath = new DOMXpath($DOMfile);
$query = sprintf('/urlset/url[./loc = "%d"]', $pageUrl);
foreach($xpath->query($query) as $element) {
    //this is where I want to delete the URL
    echo $element;
    echo "here".$element->nodeValue;
}

And its not returning anything.

I tried going a step further and used codepad, using what was used in the other post mentioned, and did this:

<?php error_reporting(-1);
$xml = <<< XML <?xml version="1.0"
encoding="UTF-8" ?> <url>
<loc>professional_services</loc>
<loc>5professional_services</loc>
<loc>6professional_services</loc> 
</url> XML; 
$id = '5professional_services'; 
$dom = new DOMDocument; $dom->loadXML($xml);
$xpath = new DOMXPath($dom); $query = sprintf('/url/[loc = $id]');
foreach($xpath->query($query) as $record) {
     $record->parentNode->removeChild($record);
}
echo $dom->saveXml();

and I'm getting a "Warning: DOMXPath::query(): Invalid expression" at the foreach loop line. Thanks for the other comment on the urlset, I'll be sure to include the double slashes in my code, tried it and it returned nothing.

ctrygstad
  • 131
  • 1
  • 3
  • 11
  • possible duplicate of [delete child node in xml file with php](http://stackoverflow.com/questions/4667433/delete-child-node-in-xml-file-with-php) – Gordon Jan 20 '11 at 21:43
  • `$url` is a `DOMNodelist` not a `DOMElement`? And the list cannot be removed, maybe you need to iterate over the list and remove each element? – Jake N Jan 20 '11 at 21:54
  • I don't understand jakenoble. Do you think the code I had before that ran through the XML and compared all loc nodes to the php variable was the right way to go? Maybe I have faulty code in the way it is right now? – ctrygstad Jan 20 '11 at 22:09
  • 2
    @ctrygstad the reason I pointed you to that other question is because it shows how to actually remove the node. That part is missing from your example. It wasnt meant to suggest to change your XPath. We cannot tell you if your XPath is correct without seeing your XML. – Gordon Jan 20 '11 at 22:36
  • @ctrygstad watch my new edited answer. – Shikiryu Jan 21 '11 at 14:35
  • 1
    @Gordon: I agree. Conceptually, it's a duplicate disriggarding the namespace issue. –  Jan 21 '11 at 22:17

1 Answers1

11

XML from a sitemap should be :

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc></loc>
...
</url>
<url>
<loc></loc>
...
</url>
...
</urlset>

Since it got a namespace, the query is a little more complicated than my previous answer :

$xpath = new DOMXpath($DOMfile);
// Here register your namespace with a shortcut
$xpath->registerNamespace('sm', "http://www.sitemaps.org/schemas/sitemap/0.9");
// this request should work
$elements = $xpath->query('/sm:urlset/sm:url[sm:loc = "'.$pageUrl.'"]');

foreach($elements as $element){
    // This is a hint from the manual comments
    $element->parentNode->removeChild($element);
}
echo $DOMfile->saveXML();

I'm writing out of memory just before going to bed. If it doesn't work I'll go test tomorrow morning. (And yes, I'm aware that it could bring some downvotes)

If you don't have a namespace (you should but that's not an obligation sigh)

$elements = $xpath->query('/urlset/url[loc = "'.$pageUrl.'"]');

You got a concrete example that it's working here : http://codepad.org/vuGl1MAc

oradwell
  • 392
  • 1
  • 12
Shikiryu
  • 10,180
  • 8
  • 49
  • 75
  • Thanks! That worked perfect, didn't know you had to declare a namespace. I do have a namespace declared in my sitemap.xml file for the record. – ctrygstad Jan 21 '11 at 15:35