2

I'm running a preg_replace on content that I don't necessarily control and I'm running into an issue with replacement values containing things like currency values (i.e. $1.00). Admittedly this is a common problem that's been addressed in other questions. The closest solution I've found is:

http://www.procata.com/blog/archives/2005/11/13/two-preg_replace-escaping-gotchas/

My problem is more complicated because the replacement value is not something I can escape ahead of time, at least not in a way I can see. Here's my preg code:

$body = preg_replace('/<special_tag id="'.$tagID.'">(.*?)<\/special_tag>/','$1',$body);

As you can see I'm capturing all content within a set custom tag, and removing the surrounding opening and closing tags, but keeping the content found inside. The replacement '$1' however doesn't lend itself to the escaping that is required, and so currency values that happen to be in the replacement values are getting terminated incorrectly.

Have I over thought this replacement? Is there something else I can use to remove my special tags keeping in mind that it must take into account the unique ID for that specific tag?

Any help would be greatly appreciated!

oucil
  • 4,211
  • 2
  • 37
  • 53
  • 3
    "Is there something else I can use to remove my special tags": uuuh, [how about a DOM parser](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662)? – Gordon Jan 03 '13 at 16:07
  • I actually used that method in another part of the same script but it seemed overkill for this particular replacement, just trying to keep overhead down if I can, but if this is the only option, I will fall back on it. – oucil Jan 03 '13 at 16:15
  • it's not the *only* option, but it comes to (my) mind before approaching this with Regex. – Gordon Jan 03 '13 at 16:17
  • @Gordon Seems the DOM is the preferred method, thanks for your input! I imagine I can rewrite my current stuff to use it more efficiently :) Since there are no other answers, if you're willing, can you create one with the same link for others, and I'll accept it. – oucil Jan 04 '13 at 14:00
  • procata.com's solution is incomplete. The string $replacement = '$10+\\5' will not be handled correctly. – zylstra Oct 29 '13 at 01:27

2 Answers2

1

Possible DOM solution that shouldn't have any of the "gotchas".

Assuming this HTML:

$html = <<< HTML
<html>
    <body>
        <special_tag id="foo">
            <p>Some content</p>
            <p>Some more content</p>
        </special_tag>
    </body>
</html>
HTML;

You pull up the children of special_tag and remove special_tag afterwards:

// create DOMDocument, suppress parsing errors
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

// get special_tag with id foo
$xpath = new DOMXPath($dom);
$foo = $xpath->query('//special_tag[@id="foo"]')->item(0);

// move all children before special_tag
while ($foo->childNodes->length > 0) {
    $foo->parentNode->insertBefore($foo->childNodes->item(0));
}

// remove now empty special_tag
$foo->parentNode->removeChild($foo);

// output
echo $dom->saveHTML($dom->documentElement);

Will result in something like

<html><body>
    <p>Some content</p>
        <p>Some more content</p>
    </body></html>
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • Thanks for the sample code as well, I'm sure others will appreciate it, and I'll also say, it's probably a much smarter move heading towards DOM than relying as heavily as I have on preg functions. Cheers! – oucil Jan 04 '13 at 18:23
0

Using Regex to parse XML/HTML is not recommended. Use a DOM parser instead.

Community
  • 1
  • 1
Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308