4

i've got a webpage that i would like to modify by code (adding link on specific words).

The HTML code:

<div class="section">
<h2>Notre histoire</h2>
<p style="text-align: justify;">SPECIFICS WORDS<strong>1998 : la création</strong></p>
<p style="text-align: justify;">pour objectif « de promouvoir, selon une démarche d’éducation active, auprès des jeunes et à travers eux, des projets d’expression collective et d’action de solidarité » (article 2).<br><br><strong>1999-2001 : les débuts SPECIFICS WORDS</strong></p>
<p style="text-align: justify;">SPECIFICS WORDS<a href="#">SPECIFICS WORDS</a></p>
</div>

So my aim is to preg_replace on SPECIFIC WORDS, but only those who are IN a P, but out from a A or a STRONG, or any either tags.

I can't use any class, or any id because i don't know the code before! I tried preg_replace PHP function, but it didn't work, and was too long to execute.

So my question is: How to select with XPATh a node without its A, STRONG, IMG chidrens ?

LiliwoL
  • 43
  • 3
  • My first impression is that unless you're using XHTML and can guarantee that there are no special characters (like ` `), you're going to have trouble getting to process via XPATH, as it would have to confirm to XML standards. I could be wrong though (has been known!) – freefaller Jun 22 '12 at 09:50
  • In general the XPath expression to select a node that is in A but not in B is `A//node()[not(ancestor::B)]` If you want text nodes only, you need to replace `node()` with `text()`. – biziclop Jun 22 '12 at 09:56

2 Answers2

2

You cannot select nodes without their children. A node is a subpart of a tree, unless it is a leaf in which case it has not further children. To select the TextNode leaves containing the word "SPECIFIC" which are direct children of P elements, you do

//p/text()[contains(.,'SPECIFIC')]

This will exclude the text nodes inside other elements, e.g. in strong or a.

To replace them, you do

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//p/text()[contains(.,"SPECIFIC")]') as $textNode) {
    $textNode->nodeValue = "REPLACED";
}
echo $dom->saveHTML();

Also see DOMDocument in php and this XPath Tutorial

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
0

If I understand correctly, you want to select all nodes in the Xml document that are direct children of a <p> element, without any other elements in between. This is possible as follows:

`//p/node()[not(self::*)]`

This expression selects

  1. in all <p> elements
  2. the immediate child nodes (without any intermediate levels)
  3. unless they are elements.
O. R. Mapper
  • 20,083
  • 9
  • 69
  • 114