1

I have the following PHP string:

\ud83c\udf38Owner IG: deidarasss\n\ud83c\udf38free ongkir BANDA ACEH dan LHOKSEUMAWE\n\u27a1 testimoni: #testydfs\n\ud83d\udcf1LINE: darafitris\nsold=delete\nCLOSE  \ud83d\ude0d\ud83d\ude03

I wanted to strip out all the unicode from this string, how can I do so?

I have tried to do the following:

 private static function removeEmoji($text) {
        $clean_text = "";

        // Match Emoticons
        $regexEmoticons = '/[\x{1F600}-\x{1F64F}]/u';
        $clean_text = preg_replace($regexEmoticons, '', $text);

        // Match Miscellaneous Symbols and Pictographs
        $regexSymbols = '/[\x{1F300}-\x{1F5FF}]/u';
        $clean_text = preg_replace($regexSymbols, '', $clean_text);

        // Match Transport And Map Symbols
        $regexTransport = '/[\x{1F680}-\x{1F6FF}]/u';
        $clean_text = preg_replace($regexTransport, '', $clean_text);

        // Match Miscellaneous Symbols
        $regexMisc = '/[\x{2600}-\x{26FF}]/u';
        $clean_text = preg_replace($regexMisc, '', $clean_text);

        // Match Dingbats
        $regexDingbats = '/[\x{2700}-\x{27BF}]/u';
        $clean_text = preg_replace($regexDingbats, '', $clean_text);

        return $clean_text;
    }

but it doesn't really help

Rolando Isidoro
  • 4,983
  • 2
  • 31
  • 43
adit
  • 32,574
  • 72
  • 229
  • 373
  • 1
    Have you tried something? – Rizier123 Apr 23 '15 at 08:19
  • @Rizier123 yes, check my function above – adit Apr 23 '15 at 08:22
  • 1
    I think you refer to this --> http://stackoverflow.com/questions/1176904/php-how-to-remove-all-non-printable-characters-in-a-string – kazimt9 Apr 23 '15 at 08:23
  • 1
    You want to remove the *Unicode escape sequences*. All of those characters are *"Unicode"*... – Also, you want to *remove* them? Why not *decode* them instead? Where does that string come from to begin with? Is it JSON by any chance? – deceze Apr 23 '15 at 08:25

1 Answers1

1

Since I did not find a better way to solve, here is my solution doing a loop. If your data is too big, this code would not be advisable.

$input = '\ud83c\udf38Owner IG: deidarasss\n\ud83c\udf38free ongkir BANDA ACEH dan LHOKSEUMAWE\n\u27a1 testimoni: #testydfs\n\ud83d\udcf1LINE: darafitris\nsold=delete\nCLOSE  \ud83d\ude0d\ud83d\ude03';

while(strpos($input,'\u') !== false){
    $bar_u0 = strpos($input,'\u');
    
    $input = str_replace(substr($input, $bar_u0, 6), '', $input);
}
echo $input;
Hygison Brandao
  • 590
  • 1
  • 4
  • 16