3

Example of markup:

<p> a paragraph </p>
<pre lang="html">
  &lt;p&gt; a paragraph &lt;/p&gt;
</pre>
<code lang="html">
  &lt;p&gt; a paragraph &lt;/p&gt;
</code>

How can I select all the stuff between <pre>,</pre>,<code>,</code> and run a function on it? Trough this function I need to pass 3 arguments: the part of the string that's selected (&lt;p&gt; a paragraph &lt;/p&gt;), the container type (pre or code), and the parameters of the container (like lang="html").

The function should change the selected part of the string based on the other 2 parameters (if it's relevant I want run the GeShi highlighter on it), then replace the contents of the original string with it, including the container. Something like:

<p> a paragraph </p>
<div class="html pre">
  &lt;p&gt; a paragraph &lt;/p&gt;
</div>
<div class="html code">
  &lt;p&gt; a paragraph &lt;/p&gt;
</div>
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Alex
  • 66,732
  • 177
  • 439
  • 641
  • *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Apr 02 '11 at 17:37
  • Is this a full HTML page with a root element or only a partial as shown above? – Gordon Apr 02 '11 at 17:40
  • no, it's partial html block, it's basically a article or a comment containing code samples... – Alex Apr 02 '11 at 17:43

1 Answers1

3

I think it should go like this:

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

$elements = $xpath->query('//pre | //code');

In some cases (e.g.: if you use getElementsByTagName instead of XPath), you will need to operate on an array to get the proper behaviour (see this question), so you need to copy the nodes to an array. I'll do it for this example:

$array = array();
foreach ($elements as $element) {
    $array[] = $element;
}

foreach ($array as $element) {
    $tag = $element->tagName;
    $content = $element->textContent;
    $lang = $element->getAttribute('lang');
    $new_content = my_function($tag, $content, $lang);

    $new_element = $dom->createElement('div');
    $new_element->setAttribute('class', "$tag $lang");
    $new_element->nodeValue = $new_content;
    $element->parentNode->replaceChild($new_element, $element);
}

Of course, in the example above, the my_function is undefined. But it should give you a good idea on the howto.

Note that this won't work on nested elements, like these:

<pre lang="html">
  <p>some nested element</p>
  &lt;p&gt; a paragraph &lt;/p&gt;
</pre>

If you want to work on nested elements, use a function to get the innerHTML instead of using $element->textContent.

Community
  • 1
  • 1
netcoder
  • 66,435
  • 19
  • 125
  • 142
  • thank you. I'm sorry to be so stupid, but how do I get the processed string? :) `$new_content` only has the code stuff in it – Alex Apr 02 '11 at 19:36