I would like a regex to match emoji characters in C#. If it matters, it's the characters from the Windows 8 touch keyboard ie.
4 Answers
There seems to be an Emoji-to-Unicode standard:
https://en.wikipedia.org/wiki/Emoji#In_Unicode
So you can probably match each of the Unicode ranges. For example, to match the range from U+1F30x to U+1F5Fx you can use [\u1F30-\u1F5F]
etc.

- 2,837
- 1
- 10
- 29

- 21,995
- 15
- 85
- 141
-
Does regex support 5 digit unicode characters? I'm using Expresso regex tester and it doesn't understand that these are 5 digits long. – Jippers Jan 25 '13 at 18:18
-
1Maybe this will help: http://stackoverflow.com/questions/364009/c-sharp-regular-expressions-with-uxxxxxxxx-characters-in-the-pattern – Ilya Kogan Jan 25 '13 at 20:10
-
I guess it's not possible then. Those articles are dated 2008 but say that it's basically not possible to go beyond \uFFFF. – Jippers Jan 25 '13 at 20:45
-
1I was trying to match ✅ and and saw this question. but answers didn't solve my problem. Finally I used this for regex pattern `\p{So}` . – MohaMad Mar 27 '20 at 01:02
-
1
-
You're right @IlyaKogan , I posted it as an answer right now, hope help other developers. – MohaMad Dec 09 '20 at 20:58
\p{So}|\p{Cs}\p{Cs}(\p{Cf}\p{Cs}\p{Cs})*
match all emojis I've tried and only those.
StringInfo was useful to make the pattern and might be usable directly instead of regex in some cases.
The pattern uses unicode categories, as shown in @MohaMad's answer. Again, with comments:
@"(?x) # Enable free-spacing-mode (could have used RegexOptions instead)
\p{So} # Match OtherSymbol, like ⏸ and ✅
|\p{Cs}\p{Cs} # OR two Surrogate
\uD83C\p{Cs} # with color-modifier, like and
# (Hacky special case of Multibyte Character Set? It works.)
|\p{Cs}\p{Cs} # OR two Surrogate, like and
(\p{Cf} # followed by a Format
\p{Cs}\p{Cs}) # and two Surrogate, like and .
* # zero or more times (I've only seen none or once.)"

- 15,770
- 3
- 27
- 36
I used Unicode General Categories and Named Blocks for this problem and described it in a short comment below the accepted answer:
I was trying to match ✅ and and saw this question. but answers didn't solve my problem. Finally I used this for regex pattern
\p{So}
for more information about Named Blocks and Unicode General Categories visit Microsoft Regular Expression Help Topic .
You're able to use different names for BasicLatin, ExtendedLatin, Arabic, Cyrilic and ...
Also more specific Symbols matching with S
family, like Currency Symbols or Math Symbols.

- 2,575
- 2
- 14
- 26
-
2This is the correct way, except that I couldn't match emojis using `\p{So}` (which detects symbols) but rather `\p{Cs}` (which detects surrogate characters) – Cobus Kruger Aug 16 '22 at 15:34
-
`\p{Cs}` will match anything in the [Supplementary Multilingual Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Supplementary_Multilingual_Plane), which will include a lot of non-English text. `\p{So}` only matches 58/1179 of the `Basic_Emoji` defined by [emoji-sequences.txt](https://www.unicode.org/Public/emoji/15.0/emoji-sequences.txt). – brianary Apr 20 '23 at 02:41
You should be able to plug in the unicode code value to represent them:
Regex regEx = new Regex(@"\uXXXX\uYYYY");
Where XXXX
and YYYY
are the unicode values of the characters you're looking for (of course changing the regular expression to fit your needs).

- 1,633
- 12
- 27