1

i'm using PHP Simple HTML DOM Parser to get text from a webpage. The page i need to manipulate is something like:

<html>
<head>
<title>title</title>
<body>
<div id="content">
<h1>HELLO</h1>
Hello, world!
</div>
</body>
</html>

I need to get the h1 element and the text that has no tags. to get the h1 i use this code:

$html = file_get_html("remote_page.html");
foreach($html->find('#content') as $text){
echo "H1: ".$text->find('h1', 0)->plaintext;
}

But the other text? I also tried this into the foreach but i get the full text:

$text->plaintext;

but it returned also the H1 tag...

hakre
  • 193,403
  • 52
  • 435
  • 836
Christian Giupponi
  • 7,408
  • 11
  • 68
  • 113

4 Answers4

0

You can simply strip html tags using strip_tags

<?php
strip_tags($input, '<br>');
?>
jrbedard
  • 3,662
  • 5
  • 30
  • 34
Peachy
  • 1
0

Use strip tags, as @Peachy pointed out. However, passing it a second argument <br> means string will ignore <br> tags, which is unnecessary. In your case,

<?php
    strip_tags($text);
?>

would work as you'd like, given that you are only selecting content in the content id.

NonCreature0714
  • 5,744
  • 10
  • 30
  • 52
0

Try it

echo "H1: ".$text->find('h1', 0)->innertext;
0

It looks like $text->find('text',2); gets what you're looking for, however I'm not sure how well that will work when the amount of text nodes is unknown. I'll keep looking.

Korvin Szanto
  • 4,531
  • 4
  • 19
  • 49