0

OK I have read many threads and have found some options that work but now I am just more curious than anything...

When trying to remove characters like: Â é as google does not like them in the XML product feed.

Why does this work:

But neither of these 2 do?

$string = preg_replace("/[^[:print:]]+/", ' ', $string);

$string = preg_replace("/[^[:print:]]/", ' ', $string);

To put it all in context here is the full function:

        // Remove all unprintable characters
        $string = ereg_replace("[^[:print:]]", ' ', $string);
        // Convert back into HTML entities after printable characters removed
        $string = htmlentities($string, ENT_QUOTES, 'UTF-8');
        // Decode back
        $string = html_entity_decode($string, ENT_QUOTES, 'UTF-8');
        // Return the UTF-8 encoded string
        $string = strip_tags(stripslashes($string));
        // Return the UTF-8 encoded string
        return utf8_encode($string);
    }           
hakre
  • 193,403
  • 52
  • 435
  • 836

1 Answers1

0

The reason that code doesn't work is because it removes characters that are not in the posix :print: character group which is comprised of printable characters. á É, etc are all printable.

You can find more about posix sets here.

Also, removing accentuated characters might not always be the best option... Check out this question for alternatives.

Community
  • 1
  • 1
0x6A75616E
  • 4,696
  • 2
  • 33
  • 57