0

I'm having problems converting the string to something readable . I'm using

NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];

but I can't convert \U7ab6\U51b1 into '

It shows as 窶冱 which is what I don't want, it should show as an '. Can anyone help me?

munchine
  • 479
  • 1
  • 9
  • 23
  • ? The character U+7AB6 is 窶 and U+51B1 is definitely 冱. How would that sequence ever represent an apostrophe? – bobince Mar 27 '11 at 11:33
  • hi bobine, it is not an apostrophe but looks like one. I have paste it here from a word document, the first is an apostrophe ' it is shown as a ’ and created by a combination of \U7ab6\U51b1. I just want it shown as ’ – munchine Mar 27 '11 at 23:18

1 Answers1

3

it is shown as a ’

That's character U+2019 RIGHT SINGLE QUOTATION MARK.

What has happened is you've had the character sequence ’s submitted to you, in the UTF-8 encoding, which comes out as bytes:

’          s
E2 80 99   73

That byte sequence has then, incorrectly, been interpreted as if it were encoded in Windows code page 932 (Japanese; more or less Shift-JIS):

E2 80    99 73
窶        冱

So in this one particular case, you could recover the ’s string by firstly encoding the characters into cp932 bytes, and then decoding those bytes back to characters using UTF-8.

However, this will not solve your real problem, which is that the strings were read in incorrectly in the first place. You got 窶冱 in this case because the UTF-8 byte sequence resulting from encoding ’s happened also to be a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you might get. Many other characters will be unrecoverably mangled.

You need to find where bytes are being read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.

bobince
  • 528,062
  • 107
  • 651
  • 834