0

I need to filter out all, but alphanumeric, Latin-Chars AND emojis

$str="Hello José' [](){}✅., welcome";

wanted result:

Hello José ✅ welcome

echo preg_replace("/[^\p{Latin} \wp-]/u",'',$str); // this is what i need

but i also need to keep the emojis ✅

I have 2 , but one deletes also the emojis, the other keep emojis but deletes everything else. i need this 2 combined

preg_replace("/[^\p{Latin} \wp-]/u",'',$str);

preg_replace("/[ -\x{2122}]\s+|\s*[ -\x{2122}]/u",'',$str);
david
  • 100
  • 1
  • 5

1 Answers1

1
preg_replace("/[^\p{Latin} \x{200d}\x{2600}-\x{1FAFF}0-9]/u",'',$str)

The area \x{2600}-\x{1F6FF} still contains some characters that are not emojis. Details see here. possibly specify several areas. I've included the digits 0-9.

jspit
  • 7,276
  • 1
  • 9
  • 17
  • some emojis (maybe new ones) were excluded but by setting the end range to 1FAFF they are all included `preg_replace("/[^\p{Latin} \x{2600}-\x{1FAFF}0-9+]/u",'',$str);` – david Oct 31 '20 at 15:02
  • maybe a little off topic but '$str="️‍"; echo "️‍"; echo "
    "; echo $str; echo preg_replace("/[^\p{Latin} \x{2600}-\x{1FAFF}0-9+]/u",'',$str);' before preg_replace it stays 1 emoji while after preg_replace they become 2 emojis is there a way to prevent that.
    – david Oct 31 '20 at 19:33
  • Wiktor Stribizew your code doesn't strip html mark-up like . what i really need is alphanumeric +Latin chars + emojis. nothing else jspit answer does all that, only with a slight problem with combination emojis. like "rainbowflag" becomes "flag" + "Rainbow". For the rest his code works fine – david Oct 31 '20 at 20:52
  • @david Very well, but you should know there is no way to match all emojis correctly in PHP regex, because it has a pattern limit, and a regex to match emojis correctly is very long. You should try to come up with a reverse logic if possible, else live with a good-enough workaround. – Wiktor Stribiżew Oct 31 '20 at 21:50
  • I've edited my post. Does this solve the problem with the combination emojis? Wiktor is right in principle with his comment. His approach can easily be expanded with the characters that should still be removed: preg_replace("/(?!-)[\p{P}<>$]/u",'',$str) – jspit Nov 01 '20 at 10:59