How to Get text from element, excluding some other elements inside that

Question

I'm using domCrawler in symfony framework. I crawled contents from html using it. Now I need to get the text inside an element with ID. I'm able to fecth the text by using the code below:

$nodeValues = $crawler1->filter('#idOfTheElement')->each(function (Crawler $node, $i) {
            return $node->text();
        });

The element(#idOfTheElement) contains some spans, buttons etc (those having some classes also). I don't want the contents inside those. How to Get text from element, excluding some other elements inside that.

Note: The text I wanted to fetch, does not have any other wrapper, other than the element #idOfTheElement

The Html is look like below:

<li id='#idOfTheElement'>Tel :<button data-pjtooltip="{dtanchor:'tooltipOpposeMkt'}" class="noMkt JS_PJ" type="button">text :</button><dl><dt><a name="tooltipOpposeMkt"></a></dt><dd><div class="wrapper"><p><strong>Signification des pictogrammes</strong></p><p>Devant un numéro, le picto <img width="11" height="9" alt="" src="something"> signale une opposition aux opérations de marketing direct.</p><span class="arrow">&nbsp;</span></div></dd></dl>12 23 45 88 99</li>

score 4 · Accepted Answer · answered May 06 '15 at 15:24

4

You can get element html and then get rid of the tags

preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $node->html());

answered May 06 '15 at 15:24

Konstantin Pereiaslov

1,786
1
18
26

The elements inside those having texts also, I don't want any of that. In that case this will not work – arun May 07 '15 at 04:13
This should remove text inside those elements. Can you give example HTML? – Konstantin Pereiaslov May 07 '15 at 22:16
Tel :
Signification des pictogrammes
Devant un numéro, le picto signale une opposition aux opérations de marketing direct.

12 23 45 88 99 – arun May 08 '15 at 04:34
@nu6a This is invalid HTML, there is a closing ``, but not the opening, so my regex outputs `text :12 23 45 88 99` for it, but for valid HTML it should work. If you have invalid HTML like that though, maybe start by passing it through [HTMLPurifier](https://github.com/ezyang/htmlpurifier)? – Konstantin Pereiaslov May 08 '15 at 12:41
The answer is working for me now. But I would like to know if any selectors like `not` in domcrawler – arun May 11 '15 at 07:48

score 1 · Answer 2 · answered Sep 20 '17 at 16:35

First remove child nodes:

$crawler1->filter('#idOfTheElement')->each(function (Crawler $crawler) {
        foreach ($crawler as $node) {
            $node->parentNode->removeChild($node);
        }
    });

Then get text without child nodes:

$cleanContent = $crawler1->filter('#idOfTheElement')->text();

How to Get text from element, excluding some other elements inside that

2 Answers2