22

The emoji consists of 2 unicodeScalars U+1F44D, U+1F3FC.

How can this be identified as 1 'displayed' emoji as it will be displayed as such on iOS?

Manuel
  • 14,274
  • 6
  • 57
  • 130
  • 4
    According to http://stackoverflow.com/a/36332149/1187415, you can consult a Unicode table: http://unicode.org/reports/tr51/#Emoji_Modifiers_Table. – Martin R Aug 23 '16 at 14:45
  • @MartinR This is a good hint but does only cover emojis with skin tone variation. There are other emojis that are not variated by skin tones, e.g. ‍❤️‍‍ consists of U+1F468 U+200D U+2764 U+FE0F U+200D U+1F48B U+200D U+1F468. – Manuel Aug 23 '16 at 14:58

1 Answers1

19

Update for Swift 4 (Xcode 9)

As of Swift 4, a "Emoji sequence" is treated as a single grapheme cluster (according to the Unicode 9 standard):

let s = "ab‍❤️‍‍"
print(s.count) // 4

so the other workarounds are not needed anymore.


(Old answer for Swift 3 and earlier:)

A possible option is to enumerate and count the "composed character sequences" in the string:

let s = "ab‍❤️‍‍"
var count = 0
s.enumerateSubstringsInRange(s.startIndex..<s.endIndex,
                             options: .ByComposedCharacterSequences) {
                                (char, _, _, _) in
                                if let char = char {
                                    count += 1
                                }
}
print(count) // 4

Another option is to find the range of the composed character sequences at a given index:

let s = "‍❤️‍‍"
if s.rangeOfComposedCharacterSequenceAtIndex(s.startIndex) == s.characters.indices {
    print("This is a single composed character")
}

As String extension methods:

// Swift 2.2:
extension String {
    var composedCharacterCount: Int {
        var count = 0
        enumerateSubstringsInRange(characters.indices, options: .ByComposedCharacterSequences) {
            (_, _, _, _) in count += 1
        }
        return count
    }

    var isSingleComposedCharacter: Bool {
        return rangeOfComposedCharacterSequenceAtIndex(startIndex) == characters.indices
    }
}

// Swift 3:
extension String {
    var composedCharacterCount: Int {
        var count = 0
        enumerateSubstrings(in: startIndex..<endIndex, options: .byComposedCharacterSequences) {
            (_, _, _, _) in count += 1
        }
        return count
    }

    var isSingleComposedCharacter: Bool {
        return rangeOfComposedCharacterSequence(at: startIndex) == startIndex..<endIndex
    }
}

Examples:

"".composedCharacterCount // 1
"".characters.count       // 2

"‍❤️‍‍".composedCharacterCount // 1
"‍❤️‍‍".characters.count       // 4

"".composedCharacterCount // 2
"".characters.count       // 1

As you see, the number of Swift characters (extended grapheme clusters) can be more or less than the number of composed character sequences.

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • This is awesome How does this work for ‍❤️‍? It has 6 unicodeScalars, 3 characters, and the rangeOfComposedCharacterSequenceAtIndex(startIndex) is 0..<8. – Manuel Aug 23 '16 at 15:37
  • @Manuel: It works also for flags (see added examples) which I find even more surprising. – Martin R Aug 23 '16 at 15:40
  • 1
    Shouldn't "".characters.count be 4 instead of 1? – Manuel Aug 23 '16 at 15:43
  • 2
    @Manuel: The "regional indicators" are a strange thing, compare http://stackoverflow.com/questions/26862282/swift-countelements-return-incorrect-value-when-count-flag-emoji. Any sequence of Regional_Indicator (RI) characters is considered a single grapheme cluster. – Martin R Aug 23 '16 at 15:45
  • 1
    @Manuel: `print(Array("‍❤️‍".unicodeScalars))` might be instructive. There are three Swift characters, but 6 Unicode scalars (including U+200D ZERO-WIDTH JOINER). The Unicode scalars > U+FFFF consume two index positions (they are internally stored as UTF-16 surrogate pair). – Unicode is fun! – Martin R Aug 23 '16 at 15:54
  • Very useful answer, so thanks. isSingleComposedCharacter returns an "index 0 out of range" on an empty string in my Swift 3 test, so anyone implementing the extension needs to ensure the string they're testing isn't empty before the test or that the extension includes a test for an empty string. – JKaz May 11 '17 at 20:08
  • @JKaz: Thank you for the feedback! Yes, the code indeed assumes a non-empty string. The method could check for an empty string, but should that count as "single-composed" or not? Perhaps better leave that choice to the caller (as you said). – Martin R May 11 '17 at 21:16
  • @MartinR For Objective-C? – vipinsaini0 Apr 17 '18 at 10:49
  • @VipinSaini: `enumerateSubstringsInRange` and `rangeOfComposedCharacterSequenceAtIndex` are methods of the `NSString` class in the Foundation library, and can be used from Objective-C without problems. – Martin R Apr 17 '18 at 10:52
  • @MartinR can you please provide a sample code for this? – vipinsaini0 Apr 17 '18 at 10:55