0

What would be a way to tell if a c# string is a single Emoji, or a valid Emoji ZWJ Sequences?

I would like to basically be able to find any Emoji from the official unicode list, http://www.unicode.org/reports/tr51/tr51-15.html#emoji_data

I don't seem to find a nuget package for this, and most SO questions don't seem to be easily applicable to my case (i.e. Is there a way to check if a string in JS is one single emoji? )

Romain Vergnory
  • 1,496
  • 15
  • 30

1 Answers1

0

I ended up using Unicode regex, which are partially implemented in .NET. Using this question (C# - Regular expression to find a surrogate pair of a unicode codepoint from any string?), I came up with the following.

Regex

//Returns the Emoji
@"([\uD800-\uDBFF][\uDC00-\uDFFF]\p{M}*){1,5}|\p{So}"

//Returns true if the string is a single Emoji
@"^(?>(?>[\uD800-\uDBFF][\uDC00-\uDFFF]\p{M}*){1,5}|\p{So})$"

Tests

    public class EmojiTests
    {
        private static readonly Regex IsEmoji = new Regex(@"^(?>(?>[\uD800-\uDBFF][\uDC00-\uDFFF]\p{M}*){1,5}|\p{So})$", RegexOptions.Compiled);

        [Theory]
        [InlineData("⭐")]
        [InlineData("")]
        [InlineData("")]
        [InlineData("")]
        [InlineData("")]
        [InlineData("")]//pinched fingers, coming soon :p
        public void ValidEmojiCases(string input)
        {
            Assert.Matches(IsEmoji, input);
        }

        [Theory]
        [InlineData("")]
        [InlineData(":p")]
        [InlineData("a")]
        [InlineData("<")]
        [InlineData("⭐⭐")]
        [InlineData("a")]
        [InlineData("‼️")]
        [InlineData("↔️")]
        public void InvalidEmojiCases(string input)
        {
            Assert.DoesNotMatch(IsEmoji, input);
        }
    }

It is not perfect (i.e. returns true for "™️", false for "◻️"), but that will do.

Romain Vergnory
  • 1,496
  • 15
  • 30