0

I am using simplehtmldom to get some website data with this

$data = array();
$html = file_get_html('http://www.example.com/'.$value, false, $context);
foreach($details as $value){

   $dataele = array();

    foreach($html->find('*[class=style11]') as $element){

      $houseinfo = trim($element->plaintext, " \t\n\r\0\x0B\xC2\xA0");
      echo $houseinfo;
      echo '<br>';
      array_push($dataele, $houseinfo);

    }   
}

but I found that there're some &nbsp; when I insert these data into database. I have tried different methods but it can't really remove the &nbsp; html tag. The methods I have tried:

$houseinfo = trim($element->plaintext, " \t\n\r\0\x0B\xC2\xA0");
$dataele[1] = html_entity_decode($dataele[1]);
$dataele[1] = str_replace("&nbsp;", "_", $dataele[1]);
$houseinfo = filter_var($houseinfo, FILTER_SANITIZE_STRING);
$dataele[1] = preg_replace("/&#?[a-z0-9]+;/i", "", $dataele[1]);
user3571945
  • 184
  • 1
  • 14
  • `Note: You might wonder why trim(html_entity_decode(' ')); doesn't reduce the string to an empty string, that's because the ' ' entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 encoding.` – Class Feb 04 '15 at 08:04
  • @Class so could you tell me how can i make it works please ? – user3571945 Feb 04 '15 at 08:06
  • 1
    If you echo out the HTML with a method such as `urlencode` you'll be able to see what characters are hidden, if any, and hence why your stuff isn't being replaced.. – h2ooooooo Feb 04 '15 at 08:07
  • [Does html_entity_decode replaces   also? If not how to replace it?](http://stackoverflow.com/a/6275467/1700963) – Class Feb 04 '15 at 08:11
  • @h2ooooooo thank you very much, I have found where the problem is. I just found that thats `&nbsp` , not ` ` ... – user3571945 Feb 04 '15 at 08:37

1 Answers1

0

hope this will help.

    $string = str_replace(' ', '-', htmlspecialchars_decode($element->plaintext)); 
            $string = preg_replace('/[^A-Za-z0-9-_!@#:$%^&*\/()+={}<>?;, \-]/', '', $string);
            $string = preg_replace('/-+/', ' ', $string);
    echo $string;
Raj
  • 150
  • 1
  • 10