Assess if a c# string is a single Emoji OR an Emoji ZWJ Sequence?

Question

What would be a way to tell if a c# string is a single Emoji, or a valid Emoji ZWJ Sequences?

I would like to basically be able to find any Emoji from the official unicode list, http://www.unicode.org/reports/tr51/tr51-15.html#emoji_data

I don't seem to find a nuget package for this, and most SO questions don't seem to be easily applicable to my case (i.e. Is there a way to check if a string in JS is one single emoji? )

Store all the emoji char values in a HashSet, then look for matches in your text. Hopefully, it would not be a huge string, but more like a comment type of thing. — insane_developer, Aug 05 '20 at 14:17
That sounds like a real pain to maintain, but I guess that would work. — Romain Vergnory, Aug 05 '20 at 14:21
"pain to maintain": just download the lists from the link in your question. — Heretic Monkey, Aug 05 '20 at 14:25
That works for today, but requires special attention for later updates of the document. — Romain Vergnory, Aug 05 '20 at 14:53
Perhaps this answer (https://stackoverflow.com/questions/51502486/how-to-get-correct-length-of-a-string-containing-emojis-in-c-sharp/51644186#51644186) could help — , Aug 06 '20 at 02:48

score 0 · Answer 1 · answered Aug 06 '20 at 08:54

I ended up using Unicode regex, which are partially implemented in .NET. Using this question (C# - Regular expression to find a surrogate pair of a unicode codepoint from any string?), I came up with the following.

Regex

//Returns the Emoji
@"([\uD800-\uDBFF][\uDC00-\uDFFF]\p{M}*){1,5}|\p{So}"

//Returns true if the string is a single Emoji
@"^(?>(?>[\uD800-\uDBFF][\uDC00-\uDFFF]\p{M}*){1,5}|\p{So})$"

Tests

    public class EmojiTests
    {
        private static readonly Regex IsEmoji = new Regex(@"^(?>(?>[\uD800-\uDBFF][\uDC00-\uDFFF]\p{M}*){1,5}|\p{So})$", RegexOptions.Compiled);

        [Theory]
        [InlineData("⭐")]
        [InlineData("")]
        [InlineData("")]
        [InlineData("")]
        [InlineData("")]
        [InlineData("")]//pinched fingers, coming soon :p
        public void ValidEmojiCases(string input)
        {
            Assert.Matches(IsEmoji, input);
        }

        [Theory]
        [InlineData("")]
        [InlineData(":p")]
        [InlineData("a")]
        [InlineData("<")]
        [InlineData("⭐⭐")]
        [InlineData("a")]
        [InlineData("‼️")]
        [InlineData("↔️")]
        public void InvalidEmojiCases(string input)
        {
            Assert.DoesNotMatch(IsEmoji, input);
        }
    }

It is not perfect (i.e. returns true for "™️", false for "◻️"), but that will do.

Assess if a c# string is a single Emoji OR an Emoji ZWJ Sequence?

1 Answers1