5

Possible Duplicate:
How to replace � in a string

I am reading data from an XML sheet coming out of a database. In the raw output I am coming accross this character "�" which is the UTF-8 string meaning "�". Doing a simple search and remove with str_replace does not do the trick when searching for "�" or "�". Is there any other way to remove this character from a string?

UPDATE:

For reference this is the function that is cleaning up strings for me.

    function db_utf8_convert($str)
{
    $convmap = array(0x80, 0x10ffff, 0, 0xffffff);
    return preg_replace('/\x{EF}\x{BF}\x{BD}/u', '', mb_encode_numericentity($str, $convmap, "UTF-8"));
}
Community
  • 1
  • 1
labago
  • 1,338
  • 2
  • 12
  • 28

2 Answers2

3

You can do this:

$str = 'UTF-8 string meaning "�"';
echo preg_replace('/\x{EF}\x{BF}\x{BD}/u', '', iconv(mb_detect_encoding($str), 'UTF-8', $str));

Output: UTF-8 string meaning ""

PhearOfRayne
  • 4,990
  • 3
  • 31
  • 44
  • I had high hopes for this but it sadly is not working for me. – labago Dec 27 '12 at 20:28
  • 1
    @jlane09 If my first answer was not working it is due to the fact your server is using an encoding other than UTF-8, so I updated my answer for you. – PhearOfRayne Dec 27 '12 at 20:39
  • Still didnt want to work, I appreciate the continued effort though. – labago Dec 27 '12 at 20:46
  • Yeah I figured it might not, since `mb_detect_encoding()` is very limited! Any reason you didn't set your encoding on your DB and Server to UTF-8? This would prevent a lot of future and current issues. – PhearOfRayne Dec 27 '12 at 20:49
  • The DB I am inserting these strings into is encoded in UTF-16. The DB it is coming out of I am not sure and have no control over. The server it is running on is UTF-8 – labago Dec 27 '12 at 20:52
  • You should be using one or the other not both. – PhearOfRayne Dec 27 '12 at 21:02
2

You could do something similar to this:

<?php
$string = "asd fsa fsaf sf � asdfasdfs";

echo preg_replace("/[^\p{Latin} ]/u", "", $string);

Check out this script for more character matches:
http://www.regular-expressions.info/unicode.html#script

EDIT

I did find, this, people says it works, you could give it a try:

<?php
function removeBOM($str=""){
    if(substr($str, 0,3) == pack("CCC",0xef,0xbb,0xbf)) {
        $str=substr($str, 3);
    }
    return $str;
}
?>
Get Off My Lawn
  • 34,175
  • 38
  • 176
  • 338