I am confused about the byte representation of an emoji encoded in UTF8. My understanding is that UTF8 characters are variable in size, up to 4 bytes.
When I encode the ❤️ emoji in UTF8 on iOS 13, I get back 6 bytes:
NSString* heartEmoji = @"❤️";
NSData* utf8 = [heartEmoji dataUsingEncoding:NSUTF8StringEncoding];
NSLog(@"%@", utf8); // {length = 6, bytes = 0xe29da4efb88f}
If I revert the operation, just consuming the first 3 bytes, I get a unicode heart back:
BYTE bytes[3] = { 0 };
[utf8 getBytes:bytes length:3];
NSString* decoded = [[NSString alloc] initWithBytes:bytes length:3 encoding:NSUTF8StringEncoding];
NSLog(@"%@", decoded); // ❤
Note that I use the heart as an example; I tried with many emoji and most are 4 bytes in UTF8, but some are 6.
Do I have some faulty assumptions about UTF8? What can I do to represent all emoji in 4 bytes as UTF8?