0

I have a string as below

<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>

I want to extract text from above HTML as Hello World, this is StackOverflow's question details page notice that I want to remove the &nbsp; as well.

How we can achieve this in PHP, I tried few functions, strip_tags, html_entity_decode etc, but all are failing in some conditions.

Please help, Thanks!

Edited my code which I am trying is as below, but its not working :( It leaves the &nbsp; and &#39; this type of characters.

$TMP_DESCR = trim(strip_tags($rs['description']));
djmzfKnm
  • 26,679
  • 70
  • 166
  • 227

4 Answers4

1

Below worked for me...had to do a str_replace on the non-breaking space though.

$string = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>";
echo htmlspecialchars_decode(trim(strip_tags(str_replace('&nbsp;', '', $string))), ENT_QUOTES);
Aaron W.
  • 9,254
  • 2
  • 34
  • 45
  • yes, that's working for me as well. If there is no solution for ` ` then its fine, we can go with replace. Thanks for the help! – djmzfKnm Feb 02 '11 at 12:11
0

strip_tags() will get rid of the tags, and trim() should get rid of the whitespace. I'm not sure if it will work with non-breaking spaces though.

sevenseacat
  • 24,699
  • 6
  • 63
  • 88
0

First, you'll have to call trim() on the HTML to remove the white space. http://php.net/manual/en/function.trim.php

Then strip_tags, then html_entity_decode.

So: html_entity_decode(strip_tags(trim(html)));

djmzfKnm
  • 26,679
  • 70
  • 166
  • 227
Rui Jiang
  • 1,662
  • 1
  • 15
  • 25
0

Probably the nicest and most reliable way to do this is with genuine (X|HT)ML parsing functions like the DOMDocument class:

<?php

$str = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>";

$dom = new DOMDocument;
$dom->loadXML(str_replace('&nbsp;', ' ', $str));

echo trim($dom->firstChild->nodeValue);
// "Hello World, this is StackOverflow's question details pages"

This is probably slight overkill for this problem, but using the proper parsing functionality is a good habit to get into.


Edit: You can reuse the DOMDocument object, so you only need two lines within the loop:

$dom = new DOMDocument;
while ($rs = mysql_fetch_assoc($result)) { // or whatever
    $dom->loadHTML(str_replace('&nbsp;', ' ', $rs['description']));
    $TMP_DESCR = $dom->firstChild->nodeValue;

    // do something with $TMP_DESCR
}
lonesomeday
  • 233,373
  • 50
  • 316
  • 318