The following EBNF can be used to quickly scan for possible emoji. Those possible emoji can then be verified where necessary by performing validity tests according to the definitions, or checking against the RGI emoji set. It is much simpler than the expressions currently in the definitions. It includes a superset of emoji as a by-product of that simplicity, but the extras can be weeded out by validity tests.
EBNF |
Notes |
possible_emoji := flag_sequence | zwj_element (\x{200D} zwj_element)* |
\x{200D} = zero-width joiner |
flag_sequence := \p{RI} \p{RI} |
\p{RI} = Regional_Indicator |
zwj_element := \p{Emoji} emoji_modification? |
|
emoji_modification := \p{EMod} | \x{FE0F} \x{20E3}? |
\p{EMod} = Emoji_Modifier \x{FE0F} = emoji VS \x{20E3} = enclosing keycap |
tag_modifier := [\x{E0020}-\x{E007E}]+ \x{E007F} |
\x{E00xx} are tags \x{E007F} = TERM tag |
From these EBNF rules a regex can be generated, as below. While this regex may seem complex, it is far simpler than what would result from the definitions. Direct use of the definitions would result in regex expressions which are many times more complicated, and yet still require verification with validity tests.
Regex |
\p{RI} \p{RI} | \p{Emoji} ( \p{EMod} | \x{FE0F} \x{20E3}? | [\x{E0020}-\x{E007E}]+ \x{E007F} )? (\x{200D} \p{Emoji} ( \p{EMod} | \x{FE0F} \x{20E3}? | [\x{E0020}-\x{E007E}]+ \x{E007F} )? )* |
– Unicode® Technical Standard #51, Unicode Emoji, Section 1.4.9 EBNF and Regex
Given the above, the following JavaScript compatible regular expression can be derived:
/\p{RI}\p{RI}|\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?(\u{200D}\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?)+|\p{EPres}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?|\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})/gu
The above regular expression paired with a string replace will allow us to replace the emoji with span tags wrapping the emoji. $&
inserts the matched substring (emoji).
'hello world '.replace(/\p{RI}\p{RI}|\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?(\u{200D}\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?)+|\p{EPres}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?|\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})/gu, '<span>$1</span>')
This solution does come with the caveat that it will match things that aren't emoji, as mentioned in the linked technical standard. I currently haven't worked out how to write a validity test from the spec so haven't attempted. You may wish to do so or balance the risk of a false positive with the effort required for figuring this bit out.
Below is a demo with (lots of) emoji being wrapped with a span that has styling to give it a 1px solid red border.
document.body.innerHTML = `☺️️☹️☠️❣️❤️❤️❤️️️️️️️✋✋✋✋✋✋✌️✌✌✌✌✌☝️☝☝☝☝☝✊✊✊✊✊✊✍️✍✍✍✍✍️♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚕️⚕⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖⚖️⚖✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈✈️✈♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀️️♂️♂️♂♂️♂♂️♂♂️♂♂️♂️♀️♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♀️♀♂️♂♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀️♂️♂♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀⛷️️️♂️♂️♂♂️♂♂️♂♂️♂♂️♂️♀️♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀⛹️⛹⛹⛹⛹⛹⛹️♂️⛹♂️⛹♂⛹♂️⛹♂⛹♂️⛹♂⛹♂️⛹♂⛹♂️⛹♂⛹️♀️⛹♀️⛹♀⛹♀️⛹♀⛹♀️⛹♀⛹♀️⛹♀⛹♀️⛹♀️️♂️♂️♂♂️♂♂️♂♂️♂♂️♂️♀️♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀♂️♂♂️♂♂️♂♂️♂♂️♂♂️♂♀️♀♀️♀♀️♀♀️♀♀️♀♀️♀❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤❤️❤️⬛️❄️❄️️️️☘️️☕️️️⛰️️️️️️️️️️️⛪⛩️⛲⛺️♨️️️️️️⛽⚓⛵️⛴️️✈️️️️⌛⏳⌚⏰⏱️⏲️️️☀️⭐☁️⛅⛈️️️️️️️️️️☂️☔⛱️⚡❄️☃️⛄☄️✨️️️⚽⚾⛳⛸️️♠️♥️♦️♣️♟️️️️⛑️️️️☎️️️⌨️️️️️️️️✉️️✏️✒️️️️️️️️️✂️️️️️⛏️⚒️️️⚔️️⚙️️⚖️⛓️⚗️️️⚰️⚱️♿⚠️⛔☢️☣️⬆️↗️➡️↘️⬇️↙️⬅️↖️↕️↔️↩️↪️⤴️⤵️⚛️️✡️☸️☯️✝️☦️☪️☮️♈♉♊♋♌♍♎♏♐♑♒♓⛎▶️⏩⏭️⏯️◀️⏪⏮️⏫⏬⏸️⏹️⏺️⏏️♀️♂️⚧️✖️➕➖➗♾️‼️⁉️❓❔❕❗〰️⚕️♻️⚜️⭕✅☑️✔️❌❎➰➿〽️✳️✴️❇️©️®️™️#️⃣*️⃣0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣️️ℹ️Ⓜ️️️️️㊗️㊙️⚫⚪⬛⬜◼️◻️◾◽▪️▫️️️️⚧️☠️☠`.replace(/\p{RI}\p{RI}|\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?(\u{200D}\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?)+|\p{EPres}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?|\p{Emoji}(\p{EMod}+|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})/gu, '<span>$&</span>');
span {
border: 1px solid red;
}