2

I want a regex that matches all emojis (or most of them) but excludes certain characters (such as “|”|‘|’|…|—).

This regex does the job via negative lookahead:

/(?!\u201C|\u201D|\u2018|\u2019|\u2026|\u2014)(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])/

But apparently Google Scripts doesn't support this. Error:

Invalid regular expression pattern (?!“|”|‘|’|…|—)(©|®|[ -㌀]|?[퀀-?]|?[퀀-?]|?[퀀-?])

Is there another way to achieve my goal (a regex that works with Google Script's findText)?

Ryan
  • 22,332
  • 31
  • 176
  • 357

1 Answers1

1

Option 1

Maybe,

[\u{1f300}-\u{1f5ff}\u{1f900}-\u{1f9ff}\u{1f600}-\u{1f64f}\u{1f680}-\u{1f6ff}\u{2600}-\u{26ff}\u{2700}-\u{27bf}\u{1f1e6}-\u{1f1ff}\u{1f191}-\u{1f251}\u{1f004}\u{1f0cf}\u{1f170}-\u{1f171}\u{1f17e}-\u{1f17f}\u{1f18e}\u{3030}\u{2b50}\u{2b55}\u{2934}-\u{2935}\u{2b05}-\u{2b07}\u{2b1b}-\u{2b1c}\u{3297}\u{3299}\u{303d}\u{00a9}\u{00ae}\u{2122}\u{23f3}\u{24c2}\u{23e9}-\u{23ef}\u{25b6}\u{23f8}-\u{23fa}]

might be working OK for your desired emojis.

Demo

Option 2

Otherwise, you might want to negate those undesired chars using char classes, such as:

[these unicode ranges &&[^these unicodes]]

which would become pretty complicated, yet possible.

Option 3

Using this option you can most likely solve your problem much simpler. I guess, your problem is that those undesired punctuations are already among the desired unicodes. Check to see if that'd be the case. For example, in

[\u100-\u200]

you might have \u150 and \u175 as undesired chars, which you want them to be removed from your desired ranges of unicodes that you already have.

You can then simply remove those from the range, such as with:

[\u100-\u149\u151-\u174\u176-\u200]

and as simple as that the problem would be solved.

Source

javascript unicode emoji regular expressions

Community
  • 1
  • 1
Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    I appreciate the effort. I see that it works in regex101 but **unfortunately** in Google Scripts gives me `Invalid regular expression pattern [u{1f300}-u{1f5ff}u{1f900}-u{1f9ff}u{1f600}-u{1f64f}u{1f680}-u{1f6ff}u{2600}-u{26ff}u{2700}-u{27bf}u{1f1e6}-u{1f1ff}u{1f191}-u{1f251}u{1f004}u{1f0cf}u{1f170}-u{1f171}u{1f17e}-u{1f17f}u{1f18e}u{3030}u{2b50}u{2b55}u{2934}-u{2935}u{2b05}-u{2b07}u{2b1b}-u{2b1c}u{3297}u{3299}u{303d}u{00a9}u{00ae}u{2122}u{23f3}u{24c2}u{23e9}-u{23ef}u{25b6}u{23f8}-u{23fa}]`. Also, if it worked in Scripts, I'd be curious *how* you excluded `“|”|‘|’|…|—` characters. – Ryan Oct 06 '19 at 20:06
  • 1
    I will give you an upvote since I didn't clarify/emphasize enough that I need it to work in Google Scripts. – Ryan Oct 06 '19 at 20:08