First off this solution doesn't work for ligatures: Convert or Print CGPDFStringRef string
I'm reading text from a PDF and trying to convert it to a NSString. I can get a byte array of text using Apple's CGPDFScanner
in the form of a CGPDFString
. The "fi" ligature character is giving me trouble. When I look at my byte array in the debugger I see a '\f'
So for simplicity sake lets say that I have this char:
unsigned char myLigatureFromPDF = '\f';
Ultimately I'd like to convert it to this (the unicode value for the "fi" ligature):
unichar whatIWant = 0xFB01;
This is my failed attempt (I copied this from PDFKitten
btw):
const char str[] = {myLigatureFromPDF, '\0'};
NSString* stringEncodedLigature = [NSString stringWithCString:str encoding:NSUTF8StringEncoding];
unichar encodedLigature = [stringEncodedLigature characterAtIndex:0];
If anyone can tell me how to do this that would be great Also, as a side note how does the debugger interpret the unencoded byte array, in other words when I hover over the array how does it know to show a '\f'
Thanks!