For a random string generator, I thought it would be nice to use CharacterSet
as input type for the alphabet to use, since the pre-defined sets such as CharacterSet.lowercaseLetters
are obviously useful (even if they may contain more diverse character sets than you'd expect).
However, apparently you can only query character sets for membership, but not enumerate let alone index them. All we get is _.bitmapRepresentation
, a 8kb chunk of data with an indicator bit for every (?) character. But even if you peel out individual bits by index i
(which is less than nice, going through byte-oriented Data
), Character(UnicodeScalar(i))
does not give the correct letter. Which means that the format is somewhat obscure -- and, of course, it's not documented.
Of course we can iterate over all characters (per plane) but that is a bad idea, cost-wise: a 20-character set may require iterating over tens of thousands of characters. Speaking in CS terms: bit-vectors are a (very) bad implementation for sparse sets. Why they chose to make the trade-off in this way here, I have no idea.
Am I missing something here, or is CharacterSet
just another deadend in the Foundation
API?