2

I have searched a lot about UTF8 decoding, but not found the answer yet.

I receive an UTF-8 decode NSString from my NSXMLParser:

NSString *tempString = @"Test message readability is óké";

In someway I can't find the way to change this encoded text to:

Test message readability is óké

I could tell all the options I tried but I don't think that should be necessary. Could please some help?

Thnx!

MiiChiel
  • 201
  • 1
  • 4
  • 14

3 Answers3

4

The NSXMLParser will treat the text using the character encoding that the XML specifies. I believe in your case the XML do not specify UTF-8 explicitly.

The text seems to be ISO Latin 1. If you can not do anything about the server generating the XML then you can apply this hack:

char* tempString = [string cStringUsingEncoding:NSISOLatin1StringEncoding];
string = [NSString stringWithUTF8String:tempString];

I have verified that this works by testing this from the GDB prompt:

po [NSString stringWithUTF8String:(char*)[@"Test message readability is óké" cStringUsingEncoding:5]]
PeyloW
  • 36,742
  • 12
  • 80
  • 99
  • That worked, thanks! Xcode gave me a warning to use 'const char*'instead of 'char'. I suppose this does not give me any memory leaks? I am not used to using char. But if I understand you well, you are saying the XML file is not saved the right way for UTF-8, right? – MiiChiel Jul 27 '11 at 15:34
  • No leaks here, and adding the `const` is the right thing to do. And yes, I believe the server either do not specify what character encoding is used, or it is specifying the wrong encoding. – PeyloW Jul 27 '11 at 15:38
  • The weird part is, that when I create an own xml file echoing it by php, in my browser it shows the right character, in objective-c it does change to the wrong characters... – MiiChiel Jul 27 '11 at 15:59
  • Do you use `` xml declaration at the head of the document? – PeyloW Jul 27 '11 at 18:17
3

You're doing it wrong. What you want is:

char *s = "Test message readability is óké";
//Note: this is a one-byte-character C string, not an NSString!
NSString *tempString = [NSString stringWithCString:s encoding:NSUTF8StringEncoding];

Also keep in mind that when you initialize string constants, what actually goes to program memory depends on the encoding of the current file. If it's already UTF-8, then the characters will be doubly-encoded - you'll get characters Ã,³, etc. encoded as UTF8 in the C string.

In other words, using a string constant is probably a wrong move to begin with. Please give more context to the problem.

Seva Alekseyev
  • 59,826
  • 25
  • 160
  • 281
  • Thank you for your response. I receive XML that states encoding="UTF-8". One of the elements contains a string with characters like é and à. But when I open the XML in my browser, I already found these óké characters. The thing is: when I reopen the text in TextWrangler(textEditor app) as UTF-8, it shows the right characters, do I reopen it as Windows Latin 1, it gives me the óké. Oh and your example does not work, I already tried! – MiiChiel Jul 27 '11 at 15:16
  • If the XML is in UTF8, then opening it as Windows Latin 1 would give you wrong characters, that's to be expected. The question is, what are you trying to accomplish *in code* in the first place? Initializing an NSString from a string literal (either directly or via a C-string) is NOT equivalent to what NSXMLParser is giving you - there's an extra encoding layer that is the source file encoding. – Seva Alekseyev Jul 27 '11 at 15:21
  • I get an XML file from someone else that shows the wrong characters: óké in my xml when parsing it. I see the same 'wrong' characters when opening the XML url in my browser. Is there a way to convert the string or loaded XMLString to the right characters? – MiiChiel Jul 27 '11 at 15:26
  • My first thoughts are that the xml-file is not saved as UTF-8 but does declare itself in the header as UTF8, causing the wrong characters – MiiChiel Jul 27 '11 at 15:30
1

Standart encoding and decoding like this:

For encoding:

NSString *content =  [bodyTextView.text stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

For decoding:

NSString *decodedString = [msg.content stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
Hiren
  • 12,720
  • 7
  • 52
  • 72
Bob
  • 1,351
  • 11
  • 28