2

Can someone please assist me with converting special characters to something that can be correctly represented in an RTF file?

I am taking text stored in a string on the iPad and outputting it as an RTF file using NSASCIIStringEncoding. So far so good. What I've neglected to do successfully, is take into account special characters (e.g. tilde, umlaut, accent, etc.) . Sorry RoW!

The most universal RTF format seems to want 8-bit text encoding with code page escape (two hexadecimal digits following a backslash). So n with tilde (ñ) would be \'f1.

The only solution that occurs to me is to convert to NSUTF8StringEncoding and then use stringByReplacingOccurrencesOfString, but there are a lot characters and it seems tedious to have to replace every one of them manually. Is there a more efficient way that is escaping me? (pun intended) :)

Thanks for any suggestions.

DenVog
  • 4,226
  • 3
  • 43
  • 72
  • Just a thought from similar problems in web development... use Unicode throughout? – Dave Everitt Sep 17 '10 at 18:27
  • I can't help you with the specifics of the RTF format, but it's worth pointing out that as of iOS 4.0 Apple have (finally) introduced regular expression support - see NSRegularExpression. Depending on whether you need to support legacy 3.x devices or not, I'd be tempted to use regexes to solve this particular problem, since it's precisely what they were designed for. – Echelon Sep 17 '10 at 18:36
  • Thanks for sharing that. It's an iPad app, so for the time being I have to support v3.2. – DenVog Sep 17 '10 at 19:20

2 Answers2

5

@falconcreek's answer saved me lots of time writing code to coping with a wider range of cases, including, say, Chinese characters (as requested by DenVog). In particular, it's important to check for: "\", "{" and "}" as these are used by the RTF format. (See How to output unicode string to RTF (using C#), for example.) The following category on NSString copes with a string such as:

The quick \ slow {brown} fox “slurped” lazily on his π-latté, while Faye Wong (王菲) played in the background.

@interface NSString (TR)    
- (NSString *)stringFormattedRTF;
@end

@implementation NSString (TR)

#define backslash 0x5C
#define openCurlyBrace 0x7B
#define closeCurlyBrace 0x7D

- (NSString *)stringFormattedRTF;
{
    NSMutableString *result = [NSMutableString string];

    for (int index = 0; index < [self length]; index++)
    {
        unichar unicodeCharacter = [self characterAtIndex: index];

        if (unicodeCharacter == backslash || unicodeCharacter == openCurlyBrace || unicodeCharacter == closeCurlyBrace)
        {
            [result appendFormat: @"\\%c", unicodeCharacter];

        }
        else if (unicodeCharacter > 127)
        {
            [result appendFormat:@"\\uc0\\u%u ", unicodeCharacter];
        }
        else
        {
            [result appendFormat:@"%c", unicodeCharacter];
        }
    }
    return result;
}

Side note: Microsoft provide 1.9.1 RTF spec, which is really helpful if you want to output RTF. Wikipedia says (as of May 2012) this the most recent version. Google tends to kick up a much older RTF specs.

Community
  • 1
  • 1
Obliquely
  • 7,002
  • 2
  • 32
  • 51
  • Thanks for sharing this. The braces came back to bite me, and this helped me sort them out. Thanks for following up on the thread. – DenVog Sep 25 '12 at 14:22
2

Check the value of characterAtIndex: if it is > 127, it is not ASCII, so escape the character.

Something like the following

- (NSString *)stringFormattedRTF:(NSString *)inputString
{
    NSMutableString *result = [NSMutableString string];

    for ( int index = 0; index < [inputString length]; index++ ) {
        NSString *temp = [inputString substringWithRange:NSMakeRange( index, 1 )];
        unichar tempchar = [inputString characterAtIndex:index];

        if ( tempchar > 127) {
            [result appendFormat:@"\\\'%02x", tempchar]; 
        } else {
            [result appendString:temp];
        }
    }
    return result;
}
falconcreek
  • 4,170
  • 1
  • 21
  • 22
  • This is not working as expected. Will update when a working solution is found – falconcreek Sep 18 '10 at 00:48
  • Thanks for the proposed answer and follow-up. – DenVog Sep 19 '10 at 15:12
  • That got it. Thanks very much falconcreek! – DenVog Sep 20 '10 at 20:05
  • The above is working great for "special characters" like umlauts and accents. Any for suggestions on how to handle two-byte characters, such as Japanese and Chinese? It seems to me that the above should already be escaping those, but right now those characters are being converted to ????. Thanks. – DenVog Oct 03 '10 at 20:19