I have following peace of HTML:
<td>
<p><span><a href="http://www.someurl.com"><b>
<span>W Bangkok</span></b></a> <br>
106 North Sathorn Road ,Silom, Bangrak<br>
Bangkok, 10500 Thailand<br>
Phone: (66)(2) 344 4000 Fax: (66)(2) 344 4111<o:p></o:p></span></p>
</td>
I want to strip of any space, newline, all the invisible characters, basically all but letters and replace them with single space. But I also want to strip of
  <br /> and <br>
Regex and function I wrote is this:
function clean_data($str)
{
return trim(preg_replace('/(\p{Zs}|\s|\R|\p{Zl}|\p{Z}|\p{Zp})++/u', ' ', $str));
}
However in the above example looks like HTML breaklines give me trouble. What I get as output is this:
W Bangkok ‎106 North Sathorn Road ,Silom, Bangrak‎‎ Bangkok, 10500 Thailand‎ Phone: (66)(2) 344 4000 Fax: (66)(2) 344 4111
How can I write better regural expression to match all those
<br /> and <br>
and everything else which might be a space or newline?
File is saved as UTF-8, when I save it as ASCII I get ? instead of ‎