Symfony 2 Dom Crawler: how to get only text() in Element

Question

Using Dom Crawler to get only text (without tag).

$html = EOT<<<
  <div class="coucu">
    Get Description <span>Coucu</span>
  </div>
EOT;

$crawler = new Crawler($html);
$crawler = $crawler->filter('.coucu')->first()->text();

output: Get Description Coucu

I want to output (only): Get Description

UPDATE:

I found a solution for this: (but it's really bad solution)

...
$html = $crawler->filter('.coucu')->html();
// use strip_tags_content in https://php.net/strip_tags
$html = strip_tags_content($html,'span');

I don't think there is a method for this but you can try $text = $crawler->filter('.coucu')->first()->extract(array('_text')); i believe it will return the same result but still worth a shot — Nawfal Serrar, May 08 '15 at 08:48
I guess that `strip_tags_content` is from https://gist.github.com/marcanuy/7651298. I personally don't like regexes for HTML, they lead to bad stuff (https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not). — reallynice, Apr 06 '18 at 15:15

score 4 · Answer 1 · answered May 26 '15 at 14:36

4

Ran into the same situation. I ended up going with:

$html = $crawler->filter('.coucu')->html();
$html = explode("<span", $html);
echo trim($html[0]);

answered May 26 '15 at 14:36

wkm

1,764
6
24
40

score 3 · Answer 2 · answered May 18 '15 at 01:29

3

Based on the criteria within your question, I think you would be best served by modifying your CSS Selector to: $crawler = $crawler->filter('div.coucu > span')

From there you can go $span_text = $crawler->text();

or to simplify things: $text = $crawler->filter('div.coucu > span')->text();

The text() method returns the value of the first item within the list.

answered May 18 '15 at 01:29

Shaun Bramley

1,989
11
16

I want to get "Get Description Coucu". – Tue Vo May 25 '15 at 18:25

score 1 · Answer 3 · answered Feb 22 '19 at 05:59

function extractCurrentText(Crawler $crawler)
{
  $clone = new Crawler();
  $clone->addHTMLContent("<body><div>" . $crawler->html() . "</div></body>", "UTF-8");
  $clone->filter("div")->children()->each(function(Crawler $child) {
    $node = $child->getNode(0);
    $node->parentNode->removeChild($node);
  });
  return $clone->text();
}

score 0 · Answer 4 · answered Apr 06 '18 at 15:17

The HTML-removing solution it's based on regexes to strip HTML away (bad idea Using regular expressions to parse HTML: why not?), and the explode solution is limited.

I came up going by difference: get all the text, then remove the non-own text with str_replace.

Abdessamad · Answer 5 · 2019-11-21T14:24:08.253

0

This works nicely without hacky workarounds:

$crawler->filter('.coucu')->children()->each(function (Crawler $crawler) {
    $crawler->getNode(0)->parentNode->removeChild($crawler->getNode(0));
});
$crawler->text(); // Get Description

edited Nov 21 '19 at 14:24

answered Nov 21 '19 at 14:18

Abdessamad

171
2
3

score 0 · Answer 6 · answered Jan 01 '20 at 11:55

0

$div = $crawler->filter('.coucu')->html();
$span = $crawler->filter('.coucu > span')->html();
$text = strip_tags(str_replace($span,'',$div));

answered Jan 01 '20 at 11:55

apprentice

107
1
5

Symfony 2 Dom Crawler: how to get only text() in Element

6 Answers6

Linked