I'm looking for a method to strip/remove all emojis from a UTF8 string. I have found a few solutions but none of them seems bullet proof
https://unicode.org/emoji/charts/index.html
I came across this post Remove emojis from string which recommend this method, but a few comments tell it doesn't catch everything
function remove_emoji($string) {
$symbols = "\x{1F100}-\x{1F1FF}" // Enclosed Alphanumeric Supplement
."\x{1F300}-\x{1F5FF}" // Miscellaneous Symbols and Pictographs
."\x{1F600}-\x{1F64F}" //Emoticons
."\x{1F680}-\x{1F6FF}" // Transport And Map Symbols
."\x{1F900}-\x{1F9FF}" // Supplemental Symbols and Pictographs
."\x{2600}-\x{26FF}" // Miscellaneous Symbols
."\x{2700}-\x{27BF}"; // Dingbats
return preg_replace('/['. $symbols . ']+/u', '', $string);
}
I also found this method which seems more sophisticated
function strip_mb4(string $str): string{
$planes_1_3 = '\xF0[\x90-\xBF][\x80-\xBF]{2}';
$planes_4_15 = '[\xF1-\xF3][\x80-\xBF]{3}';
$plane_16 = '\xF4[\x80-\x8F][\x80-\xBF]{2}';
return preg_replace("/(?:$planes_1_3|$planes_4_15|$plane_16)/", '', $str);
}
I haven't fully tested the two methods. Can anyone tell me which one would catch most or even all emojis, or do you have an even better method to use?