3

I'm in the middle of doing some string manipulation using high-level Cocoa features like NSString and NSData as opposed to digging down to C-level things like working on arrays of chars.

For the love of it, +[NSString stringWithUTF8String:]sometimes returns nil on a perfectly good string that was created with -[NSString UTF8String] in the first place. One would assume that this happens when the input is malformed. Here is an example of the input that fails, in hex:

55 6B 66 51 35 59 4A 5C 6A 60 40 33 5F 45 58 60 9D 47 3F 6E 5E 
60 59 34 58 68 41 4B 61 4E 3F 41 46 00

and ASCII:

UkfQ5YJ\j`@3_EX`G?n^`Y4XhAKaN?AF

This is a randomly generated string, to test my subroutine.

char * buffer = [randomNSString UTF8String];
// .... doing things .... in the end, buffer is the same as before
NSString * result = [NSString stringWithUTF8String:buffer];
// yields nil

Edit: Just in case somebody didn't grasp the implicit question, here it is in -v mode:

Why does [NSString stringWithUTF8String:] sometimes return nil on a perfectly formed UTF8-String?

Joe Völker
  • 781
  • 1
  • 5
  • 19
  • Is there any chance the autorelease pool is drained between `-UTF8String` and `-stringWithUTF8String:`? –  Jun 07 '11 at 09:40
  • @Bavarious: Nope, `buffer` still is alive and kicking by the time `stringWithUTF8String:` is invoked. – Joe Völker Jun 07 '11 at 09:49
  • 1
    Could you post the original UTF-8 string that yielded that buffer? Maybe an `NSData` representation via `-dataUsingEncoding:` first, and then the buffer after `-UTF8String`. –  Jun 07 '11 at 09:54
  • 1
    There's a mismatch between the given ASCII and hex representations -- the 9D is not present in the ASCII. – walkytalky Jun 07 '11 at 10:14
  • 1
    Looking at the UTF8 spec, this buffer is *not* valid UTF8, so NSString is right to fail. So I guess the question is *why* isn't it right? If you cut out the middle man and just go `result=[NSString stringWithUTF8String:[randomNSString UTF8String]]` do you get a valid result? – walkytalky Jun 07 '11 at 10:27
  • How was `randomNSString` created? – Peter Hosey Jun 08 '11 at 02:14

2 Answers2

2

walkytalky is right. 9d is not legal in utf8 in this way. utf8 bytes with the top bits 10 are reserved as continuation characters, they never appear without a prefix character with more than one leading bit.

Jeff Laing
  • 913
  • 7
  • 13
0

This is a bit of a stab in the dark because we don't have enough information to properly diagnose the problem.

If randomNSString no longer exists at the point where you allocate the memory for result, for instance, if it has been released in a reference counted environment or collected in a GC environment, it is possible that buffer points to memory that has been freed but not yet reused (which would explain why it is still the same).

However, creating a new NSString requires allocation of memory and it might use the block pointed to by buffer which would mean your UTF8 string would get zapped by the internals of the new NSString. You can test this theory by loggin the contents of buffer after failing to create result. Don't use the %s specifier though, print the hex bytes.

JeremyP
  • 84,577
  • 15
  • 123
  • 161