I think you could also use Unicode character properties. Even Unicode Consortium themselves provide a regex, which can be adjusted for ECMAScript relatively easily (by replacing all occurrences of \x
with \u
and putting it all in one line). It does select possible Emoji though, meaning it will yield false positives. It's explicitly advised to still validate all matches before assuming they are in fact emoji.
Here's a somewhat stricter version of that regex which will return less false positives, with a mini demo:
const sentence = 'A ticket to 大阪 costs ¥2000 . Repeated emojis: . Crying cat: . Repeated emoji with skin tones: ✊✊✊✊✊✊. Flags: . Scales ⚖️⚖️⚖️.';
const regexpUnicodeModified = /\p{RI}\p{RI}|\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?(\u{200D}\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?)+|\p{EPres}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?|\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})/gu
console.log(sentence.match(regexpUnicodeModified));
This will log the following:
> Array ["", "", "", "", "✊", "✊", "✊", "✊", "✊", "✊", "", "", "⚖️", "⚖️", "⚖️"]
which means it matches:
- simple emoji
- emoji with modifiers (skin tones)
- country flags
- region flags
- emoji presentation sequences
Note that I don't see how this could be used for replacing specific emoji with images, as the OP wanted, but it does make it possible to place the emoji inside extra tags and such.