0

I have several HTML paragraphs like this (always same structure):

<p>
    <!-- Gl. 1-4 -->
    \( x(t) = x_0 · t^3 \)
    [!equanchor? &id=`555`!]
</p>

I am extracting the 555 successfully by:

$xpath = new DomXPath($dom);
$paragraphs = $xpath->query('//p');
foreach($paragraphs as $p)
{
    $ptext = $p->nodeValue;
    if(strpos($ptext, 'equanchor') !== false)
    {
        // get equation id from anchor
        preg_match('/equanchor\?\s\&id=`(.*)\`/', $ptext, $matches);
        $equationids[] = (int)$matches[1];
    }
}

Now I would also need the text from the HTML comment, which is <!-- Gl. 1-4 -->, but I couldn't find out how to use the DOM parser (DomXPath) for this purpose. Unfortunately, the $p->nodeValue nor the $p->textContent do contain the comment text.

This answer did not help me. I tried a "sub parser" but it failed to read the $ptext or $p.

Avatar
  • 14,622
  • 9
  • 119
  • 198

1 Answers1

1

You can use the comment() XPath function (from Accessing Comments in XML using XPath).

So in your case, when you want to get the comment in the <p> tag, you can just add the line...

echo $dom->saveHTML($xpath->query("comment()", $p)[0]);

in your foreach loop (this fetches the comment node within the $p element in your loop). Using [0] to get the first one (assuming only one).

Which outputs...

<!-- Gl. 1-4 -->
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
  • Lovely. Thanks a lot! I had never got the idea to use `saveHTML()` for this. PS: Added a simple `$comment = str_replace([''], '', $comment);` to get the inner text with "Gl. 1-4". – Avatar May 03 '18 at 15:13
  • 1
    If you just need the content `$comment = $xpath->query("comment()", $p)[0]->textContent;` may be neater. – Nigel Ren May 03 '18 at 15:16
  • Does not work within the `saveHTML()` function. But the solution is fantastic enough already :) – Avatar May 03 '18 at 15:18
  • `textContent` is a property of the DOMNode node, so you don't need to convert it back to HTML. – Nigel Ren May 03 '18 at 15:20