How to know if two emojis will be displayed as one emoji?

Question

The emoji consists of 2 unicodeScalars U+1F44D, U+1F3FC.

How can this be identified as 1 'displayed' emoji as it will be displayed as such on iOS?

According to http://stackoverflow.com/a/36332149/1187415, you can consult a Unicode table: http://unicode.org/reports/tr51/#Emoji_Modifiers_Table. — Martin R, Aug 23 '16 at 14:45
@MartinR This is a good hint but does only cover emojis with skin tone variation. There are other emojis that are not variated by skin tones, e.g. ‍❤️‍‍ consists of U+1F468 U+200D U+2764 U+FE0F U+200D U+1F48B U+200D U+1F468. — Manuel, Aug 23 '16 at 14:58

Martin R · Accepted Answer · 2017-06-07T19:35:20.083

19

Update for Swift 4 (Xcode 9)

As of Swift 4, a "Emoji sequence" is treated as a single grapheme cluster (according to the Unicode 9 standard):

let s = "ab‍❤️‍‍"
print(s.count) // 4

so the other workarounds are not needed anymore.

(Old answer for Swift 3 and earlier:)

A possible option is to enumerate and count the "composed character sequences" in the string:

let s = "ab‍❤️‍‍"
var count = 0
s.enumerateSubstringsInRange(s.startIndex..<s.endIndex,
                             options: .ByComposedCharacterSequences) {
                                (char, _, _, _) in
                                if let char = char {
                                    count += 1
                                }
}
print(count) // 4

Another option is to find the range of the composed character sequences at a given index:

let s = "‍❤️‍‍"
if s.rangeOfComposedCharacterSequenceAtIndex(s.startIndex) == s.characters.indices {
    print("This is a single composed character")
}

As String extension methods:

// Swift 2.2:
extension String {
    var composedCharacterCount: Int {
        var count = 0
        enumerateSubstringsInRange(characters.indices, options: .ByComposedCharacterSequences) {
            (_, _, _, _) in count += 1
        }
        return count
    }

    var isSingleComposedCharacter: Bool {
        return rangeOfComposedCharacterSequenceAtIndex(startIndex) == characters.indices
    }
}

// Swift 3:
extension String {
    var composedCharacterCount: Int {
        var count = 0
        enumerateSubstrings(in: startIndex..<endIndex, options: .byComposedCharacterSequences) {
            (_, _, _, _) in count += 1
        }
        return count
    }

    var isSingleComposedCharacter: Bool {
        return rangeOfComposedCharacterSequence(at: startIndex) == startIndex..<endIndex
    }
}

Examples:

"".composedCharacterCount // 1
"".characters.count       // 2

"‍❤️‍‍".composedCharacterCount // 1
"‍❤️‍‍".characters.count       // 4

"".composedCharacterCount // 2
"".characters.count       // 1

As you see, the number of Swift characters (extended grapheme clusters) can be more or less than the number of composed character sequences.

edited Jun 07 '17 at 19:35

answered Aug 23 '16 at 14:57

Martin R

529,903
94
1,240
1,382

This is awesome How does this work for ‍❤️‍? It has 6 unicodeScalars, 3 characters, and the rangeOfComposedCharacterSequenceAtIndex(startIndex) is 0..<8. – Manuel Aug 23 '16 at 15:37
@Manuel: It works also for flags (see added examples) which I find even more surprising. – Martin R Aug 23 '16 at 15:40
1

Shouldn't "".characters.count be 4 instead of 1? – Manuel Aug 23 '16 at 15:43
2

@Manuel: The "regional indicators" are a strange thing, compare http://stackoverflow.com/questions/26862282/swift-countelements-return-incorrect-value-when-count-flag-emoji. Any sequence of Regional_Indicator (RI) characters is considered a single grapheme cluster. – Martin R Aug 23 '16 at 15:45
1

@Manuel: `print(Array("‍❤️‍".unicodeScalars))` might be instructive. There are three Swift characters, but 6 Unicode scalars (including U+200D ZERO-WIDTH JOINER). The Unicode scalars > U+FFFF consume two index positions (they are internally stored as UTF-16 surrogate pair). – Unicode is fun! – Martin R Aug 23 '16 at 15:54
Very useful answer, so thanks. isSingleComposedCharacter returns an "index 0 out of range" on an empty string in my Swift 3 test, so anyone implementing the extension needs to ensure the string they're testing isn't empty before the test or that the extension includes a test for an empty string. – JKaz May 11 '17 at 20:08
@JKaz: Thank you for the feedback! Yes, the code indeed assumes a non-empty string. The method could check for an empty string, but should that count as "single-composed" or not? Perhaps better leave that choice to the caller (as you said). – Martin R May 11 '17 at 21:16
@MartinR For Objective-C? – vipinsaini0 Apr 17 '18 at 10:49
@VipinSaini: `enumerateSubstringsInRange` and `rangeOfComposedCharacterSequenceAtIndex` are methods of the `NSString` class in the Foundation library, and can be used from Objective-C without problems. – Martin R Apr 17 '18 at 10:52
@MartinR can you please provide a sample code for this? – vipinsaini0 Apr 17 '18 at 10:55

How to know if two emojis will be displayed as one emoji?

1 Answers1

Linked