0

i'm currently having an problem, i don't know how to make regex match special characters whilst ignoring emojis.

Example, i want to match the special chars that are not emojis in this string: ❤️❤️

currently as my regex i have

[^\x00-\x7F]+

Current output: ❤️❤️

Wanted output:

How would i go around fixing this?

John Conde
  • 217,595
  • 99
  • 455
  • 496
Luc Smith
  • 1
  • 1
  • I think you have it backwards, `` is alphanumeric `[\pL\pN]+`, _not special_ `[^\pL\pN]+` The ultimate solution is you have to move past the emoji which are mostly sequences to be matched, then `(*SKIP)(*FAIL)` them. –  Jul 14 '19 at 19:32

3 Answers3

0

Maybe, this expression might work:

$re = '/[\x{1f300}-\x{1f5ff}\x{1f900}-\x{1f9ff}\x{1f600}-\x{1f64f}\x{1f680}-\x{1f6ff}\x{2600}-\x{26ff}\x{2700}-\x{27bf}\x{1f1e6}-\x{1f1ff}\x{1f191}-\x{1f251}\x{1f004}\x{1f0cf}\x{1f170}-\x{1f171}\x{1f17e}-\x{1f17f}\x{1f18e}\x{3030}\x{2b50}\x{2b55}\x{2934}-\x{2935}\x{2b05}-\x{2b07}\x{2b1b}-\x{2b1c}\x{3297}\x{3299}\x{303d}\x{00a9}\x{00ae}\x{2122}\x{23f3}\x{24c2}\x{23e9}-\x{23ef}\x{25b6}\x{23f8}-\x{23fa}]/u';
$str = '❤️❤️';
$subst = '';

echo preg_replace($re, $subst, $str);

Output

The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.

Reference:

javascript unicode emoji regular expressions

Emma
  • 27,428
  • 11
  • 44
  • 69
0

Use the following unicode regex:

[^\p{M}\p{S}]+
  • \p{M} matches characters intended to be combined with another character (here ).
  • \p{S} matches symbols ( in this case).

Demo

Junitar
  • 905
  • 6
  • 13
  • \p{S} doesn't match all emojis/symbols for example: – N S Jun 30 '21 at 14:28
  • @N S you can use `\p{Cs}` to match emojis that are not a combination of a symbol and an enclosing mark. `[\p{Cs}\p{M}\p{S}]+` will match all emojis. – Junitar Jun 30 '21 at 15:36
0

I think that your posts' title does not match it's body.

There is virtually no overlap between emoji and AlphaNum characters.
There are a couple of keycap emoji but since their sequence beyond
the first digits don't overlap the alphanum, it's enough just to put
a negative look ahead in front of the alphanum class.

'~(?![0-9]\x{FE0F}\x{20E3}|\x{2139})[\pL\pN]+~'

https://regex101.com/r/1JcUqY/1