5

I'm using domCrawler in symfony framework. I crawled contents from html using it. Now I need to get the text inside an element with ID. I'm able to fecth the text by using the code below:

$nodeValues = $crawler1->filter('#idOfTheElement')->each(function (Crawler $node, $i) {
            return $node->text();
        });

The element(#idOfTheElement) contains some spans, buttons etc (those having some classes also). I don't want the contents inside those. How to Get text from element, excluding some other elements inside that.

Note: The text I wanted to fetch, does not have any other wrapper, other than the element #idOfTheElement

The Html is look like below:

<li id='#idOfTheElement'>Tel :<button data-pjtooltip="{dtanchor:'tooltipOpposeMkt'}" class="noMkt JS_PJ" type="button">text :</button><dl><dt><a name="tooltipOpposeMkt"></a></dt><dd><div class="wrapper"><p><strong>Signification des pictogrammes</strong></p><p>Devant un numéro, le picto <img width="11" height="9" alt="" src="something"> signale une opposition aux opérations de marketing direct.</p><span class="arrow">&nbsp;</span></div></dd></dl>12 23 45 88 99</li>
arun
  • 3,667
  • 3
  • 29
  • 54

2 Answers2

4

You can get element html and then get rid of the tags

preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $node->html());
Konstantin Pereiaslov
  • 1,786
  • 1
  • 18
  • 26
  • The elements inside those having texts also, I don't want any of that. In that case this will not work – arun May 07 '15 at 04:13
  • This should remove text inside those elements. Can you give example HTML? – Konstantin Pereiaslov May 07 '15 at 22:16
  • Tel :

    Signification des pictogrammes

    Devant un numéro, le picto signale une opposition aux opérations de marketing direct.

     
    12 23 45 88 99
    – arun May 08 '15 at 04:34
  • @nu6a This is invalid HTML, there is a closing ``, but not the opening, so my regex outputs `text :12 23 45 88 99` for it, but for valid HTML it should work. If you have invalid HTML like that though, maybe start by passing it through [HTMLPurifier](https://github.com/ezyang/htmlpurifier)? – Konstantin Pereiaslov May 08 '15 at 12:41
  • The answer is working for me now. But I would like to know if any selectors like `not` in domcrawler – arun May 11 '15 at 07:48
1

First remove child nodes:

$crawler1->filter('#idOfTheElement')->each(function (Crawler $crawler) {
        foreach ($crawler as $node) {
            $node->parentNode->removeChild($node);
        }
    });

Then get text without child nodes:

$cleanContent = $crawler1->filter('#idOfTheElement')->text();
leealex
  • 1,473
  • 1
  • 17
  • 24