1

Swift offers a series of encodings for strings. As of the time I'm writing this, none of them are documented, which makes this absurdly more confusing than it should be...

I can understand that .ascii means it's ASCII encoded, .utf8 means the string is UTF-8 encoded, and .utf16BigEndian means the string is UTF-16 but big-endian. These obviously map to real text encodings.

Then there's .unicode. There is no "Unicode" encoding. The Unicode standard defines UTF-8, UTF-16, and UTF-32, which, as I said above, are already defined in Swift.

Is it a fancy one which figures out the best one for the system? Is it an alias for .utf8? Is it some weird Apple Unicode encoding?

Ky -
  • 30,724
  • 51
  • 192
  • 308
  • 1
    In many frameworks, "unicode" usually means UTF-16 in the underlying platform's native endian. – Remy Lebeau Oct 09 '19 at 00:55
  • @RemyLebeau really? I figured it'd be UTF-16. Do you know why, or have somewhere I can read more about that? – Ky - Oct 09 '19 at 01:26
  • This is a presentation I saw a couple of months ago at a Swift conference. It filled in a lot of blanks for me re: unicode. https://github.com/rachelhyman/emoji/blob/master/presentation.pdf – Adrian Oct 09 '19 at 03:40
  • Thanks! Also I must've been really tired in my previous comment; I meant I figured it'd be UTF-8. I'll look through it! – Ky - Oct 09 '19 at 16:00

1 Answers1

3

It would appear to be an alias for .utf16. From CFString.h:

#define kCFStringEncodingInvalidId (0xffffffffU)
typedef CF_ENUM(CFStringEncoding, CFStringBuiltInEncodings) {
    kCFStringEncodingMacRoman = 0,
    kCFStringEncodingWindowsLatin1 = 0x0500, /* ANSI codepage 1252 */
    kCFStringEncodingISOLatin1 = 0x0201, /* ISO 8859-1 */
    kCFStringEncodingNextStepLatin = 0x0B01, /* NextStep encoding*/
    kCFStringEncodingASCII = 0x0600, /* 0..127 (in creating CFString, values greater than 0x7F are treated as corresponding Unicode value) */
    kCFStringEncodingUnicode = 0x0100, /* kTextEncodingUnicodeDefault  + kTextEncodingDefaultFormat (aka kUnicode16BitFormat) */
    kCFStringEncodingUTF8 = 0x08000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF8Format */
    kCFStringEncodingNonLossyASCII = 0x0BFF, /* 7bit Unicode variants used by Cocoa & Java */

    kCFStringEncodingUTF16 = 0x0100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16Format (alias of kCFStringEncodingUnicode) */
    kCFStringEncodingUTF16BE = 0x10000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16BEFormat */
    kCFStringEncodingUTF16LE = 0x14000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16LEFormat */

    kCFStringEncodingUTF32 = 0x0c000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF32Format */
    kCFStringEncodingUTF32BE = 0x18000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF32BEFormat */
    kCFStringEncodingUTF32LE = 0x1c000100 /* kTextEncodingUnicodeDefault + kUnicodeUTF32LEFormat */
};

You can confirm this with:

print(String.Encoding.unicode.rawValue, String.Encoding.utf16.rawValue)
Rob
  • 415,655
  • 72
  • 787
  • 1,044
  • I did notice that in Swift, but was concerned this was some platform-dependent thing. There are other such seemingly-generated headers around Cocoa which give me pause, like `TargetConditionals.h`. Hopefully this is not one of them. Can you think of any way to test this possibility? – Ky - Oct 09 '19 at 00:34
  • 1
    Feel free to peruse the source of truth, [swift-corelibs-foundation](https://github.com/apple/swift-corelibs-foundation). E.g., see [`String.Encoding` definitions here](https://github.com/apple/swift-corelibs-foundation/blob/c82e1deba471bc9d26516c332f25578551356d9f/Foundation/StringEncodings.swift), mappings between [`NSString` and `CFString` codings here](https://github.com/apple/swift-corelibs-foundation/blob/155f1ce1965effe55289477507a6f9fbdc8fe333/CoreFoundation/String.subproj/CFStringUtilities.c#L210), etc. Just the same, I’d stick with `.utf8`/`16` variants, not `.unicode`. – Rob Oct 09 '19 at 00:40