How to get all characters of the font with CTFontCopyCharacterSet() in Swift?

Question

How does one get all characters of the font with CTFontCopyCharacterSet() in Swift? ... for macOS?

The issue occured when implementing the approach from an OSX: CGGlyph to UniChar answer in Swift.

func createUnicodeFontMap() {
    // Get all characters of the font with CTFontCopyCharacterSet().
    let cfCharacterSet: CFCharacterSet = CTFontCopyCharacterSet(ctFont)

    //    
    let cfCharacterSetStr = "\(cfCharacterSet)"
    print("CFCharacterSet: \(cfCharacterSet)")  

    // Map all Unicode characters to corresponding glyphs
    var unichars = [UniChar](…NYI…) // NYI: lacking unichars for CFCharacterSet
    var glyphs = [CGGlyph](repeating: 0, count: unichars.count)
    guard CTFontGetGlyphsForCharacters(
        ctFont, // font: CTFont
        &unichars, // characters: UnsafePointer<UniChar>
        &glyphs, // UnsafeMutablePointer<CGGlyph>
        unichars.count // count: CFIndex
        )
        else {
            return
    }

    // For each Unicode character and its glyph, 
    // store the mapping glyph -> Unicode in a dictionary.
    // ... NYI
}

What to do with CFCharacterSet to retrieve the actual characters has been elusive. Autocompletion of the cfCharacterSet instance offers show no relavant methods.

And the Core Foundation > CFCharacterSet appears have methods for creating another CFCharacterSet, but not something the provides an array|list|string of unichars to be able to create a mapped dictionary.

Note: I'm looking for a solution which is not specific to iOS as in Get all available characters from a font which uses UIFont.

Martin R · Accepted Answer · 2019-06-27T11:19:15.413

CFCharacterSet is toll-free bridged with the Cocoa Foundation counterpart NSCharacterSet, and can be bridged to the corresponding Swift value type CharacterSet:

let charset = CTFontCopyCharacterSet(ctFont) as CharacterSet

Then the approach from NSArray from NSCharacterSet can be used to enumerate all Unicode scalar values of that character set (including non-BMP points, i.e. Unicode scalar values greater than U+FFFF).

The CTFontGetGlyphsForCharacters() expects non-BMP characters as surrogate pair, i.e. as an array of UTF-16 code units.

Putting it together, the function would look like this:

func createUnicodeFontMap(ctFont: CTFont) ->  [CGGlyph : UnicodeScalar] {

    let charset = CTFontCopyCharacterSet(ctFont) as CharacterSet

    var glyphToUnicode = [CGGlyph : UnicodeScalar]() // Start with empty map.

    // Enumerate all Unicode scalar values from the character set:
    for plane: UInt8 in 0...16 where charset.hasMember(inPlane: plane) {
        for unicode in UTF32Char(plane) << 16 ..< UTF32Char(plane + 1) << 16 {
            if let uniChar = UnicodeScalar(unicode), charset.contains(uniChar) {

                // Get glyph for this `uniChar` ...
                let utf16 = Array(uniChar.utf16)
                var glyphs = [CGGlyph](repeating: 0, count: utf16.count)
                if CTFontGetGlyphsForCharacters(ctFont, utf16, &glyphs, utf16.count) {
                    // ... and add it to the map.
                    glyphToUnicode[glyphs[0]] = uniChar
                }
            }
        }
    }

    return glyphToUnicode
}

score 3 · Answer 2 · answered Jun 27 '19 at 05:27

3

You can do something like this.

let cs = CTFontCopyCharacterSet(font) as NSCharacterSet
let bitmapRepresentation = cs.bitmapRepresentation

The format of the bitmap is defined in the reference page for CFCharacterSetCreateWithBitmapRepresentation

answered Jun 27 '19 at 05:27

idz

12,825
1
29
40

1

`as CharacterSet`. https://developer.apple.com/documentation/foundation/characterset would also lead to the `.bitmapRepresentation` approach. – marc-medley Jun 27 '19 at 19:37
2

Good to know. Also worth mentioning that using the `bitmapRepresentation` you can be quite a bit more efficient than the accepted answer; it has to loop through 65535 unicodes for each plane. Using bitmap representation you can skip groups of 8 - 64 by comparing a `UInt8` - `UInt64` at an index in the buffer to zero. Then you can skip regions that have no codes in them. – idz Jun 27 '19 at 20:06
Interesting. I was unfamiliar with the `bitmapRepresentation` data format so, next steps to a mapped dictionary of UnicodeScalar was TBD... although likely not difficult. A secondary, longer term goal is have some [CGFont and CTFont functionality in portable Swift (e.g. Ubuntu, etc)](https://stackoverflow.com/questions/56782036/cgfont-and-ctfont-functionality-in-portable-swift-e-g-ubuntu-etc). My workaround, for now, (and use case for this question) is a one-time pass to extract some desired font metrics to a portable format. I may revisit this approach if better performance is needed. Thx. – marc-medley Jun 27 '19 at 21:23
Yeah, I have no idea if the time taken to do either approach would ever be significant, just thought I'd mention it for completeness sake. – idz Jun 27 '19 at 21:33
1

Just enumerating the character set of "Helvetiva“ takes 6 milliseconds on my MacBook. – Btw, note that planes 1...15 in the bitmapRepresentation are not aligned at 8 byte boundaries, that must be taken care of when interpreting the data as UInt64 numbers. – Martin R Jun 28 '19 at 07:05
@MartinR Good to know! I agree that once you're down in that range the user isn't even going to notice. – idz Jun 28 '19 at 18:05

How to get all characters of the font with CTFontCopyCharacterSet() in Swift?

2 Answers2

Linked