You are dealing with a surrogate pair. Surrogate pairs are UTF's way of encoding certain characters.
That cannot be represented as one Char
. You can check that by attempting to define it as a char literal.
val someChar = '' // Error: Too many character in character literal ""
So how to count those properly? Kotlin's standard library has a function for that (hasSurrogatePairAt
) which you could put in an extension function like that:
fun String.countSurrogatePairs() = withIndex().count {
hasSurrogatePairAt(it.index)
}
Usage:
println("".countSurrogatePairs()) // 1
println("".countSurrogatePairs()) // 2
So, Python seems to already handle that.