I ended up using Unicode regex, which are partially implemented in .NET.
Using this question (C# - Regular expression to find a surrogate pair of a unicode codepoint from any string?), I came up with the following.
Regex
//Returns the Emoji
@"([\uD800-\uDBFF][\uDC00-\uDFFF]\p{M}*){1,5}|\p{So}"
//Returns true if the string is a single Emoji
@"^(?>(?>[\uD800-\uDBFF][\uDC00-\uDFFF]\p{M}*){1,5}|\p{So})$"
Tests
public class EmojiTests
{
private static readonly Regex IsEmoji = new Regex(@"^(?>(?>[\uD800-\uDBFF][\uDC00-\uDFFF]\p{M}*){1,5}|\p{So})$", RegexOptions.Compiled);
[Theory]
[InlineData("⭐")]
[InlineData("")]
[InlineData("")]
[InlineData("")]
[InlineData("")]
[InlineData("")]//pinched fingers, coming soon :p
public void ValidEmojiCases(string input)
{
Assert.Matches(IsEmoji, input);
}
[Theory]
[InlineData("")]
[InlineData(":p")]
[InlineData("a")]
[InlineData("<")]
[InlineData("⭐⭐")]
[InlineData("a")]
[InlineData("‼️")]
[InlineData("↔️")]
public void InvalidEmojiCases(string input)
{
Assert.DoesNotMatch(IsEmoji, input);
}
}
It is not perfect (i.e. returns true for "™️", false for "◻️"), but that will do.