0

I am trying to use NSXMLParser to parse an XML file that looks something like this:

<?xml version="1.0" encoding="us-ascii"?>
<teams>
    <team id = "A1">
        <player1>John</player1>
        <player2>José</player2>
    </team>
    ...
</teams>

I use the following code:

NSString *urlString = [NSString stringWithFormat:@"http://www....abc.php?category=%@&poule=%c", @"S", 'B'];  // Obviously, this contains an actual web address
NSURL *url = [NSURL URLWithString:urlString];
NSData *xml = [[NSData alloc] initWithContentsOfURL:url];   // <==
NSXMLParser *xmlParserObject = [[NSXMLParser alloc]initWithData:xml];
[xmlParserObject setDelegate:self];
[xmlParserObject parse];

and I implemented the didStartElement, foundCharacters, didEndElement and the parserErrorOccurred delegate functions.

This all goes well until a 'special' character, such as an é is encountered. The delegate method parserErrorOccurred reports the following error:

parser error: Error Domain=NSXMLParserErrorDomain Code=1544 "The operation couldn’t be completed. (NSXMLParserErrorDomain error 1544.)"
parser error: Error Domain=NSXMLParserErrorDomain Code=5 "The operation couldn’t be completed. (NSXMLParserErrorDomain error 5.)"

Then I replaced the part marked with '<==' with the following:

NSError *error;
NSData *xml = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:&error];
if (xml == nil) {
    NSLog(@"*** Fatal error: %@\nuserInfo:%@", error, [error userInfo]);
}

and got the following error in addition to the one above:

 *** Fatal error: Error Domain=NSCocoaErrorDomain Code=261 "The operation couldn’t be completed. (Cocoa error 261.)" UserInfo=0x8158d90 {NSURL=http://www....abc.php?category=S&poule=B, NSStringEncoding=4}
userInfo:{
    NSStringEncoding = 4;
    NSURL = "http://www....abc.php?category=S&poule=B";
}

I also tried replacing the NSUTF8StringEncoding with any of the other encoders, such as NSISOLatin1StringEncoding, NSUTF16StringEncoding, NSASCIIStringEncoding, NSUnicodeStringEncoding and more. This resulted in the following error:

 -[__NSCFString bytes]: unrecognized selector sent to instance 0x6e4cbc0
 *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[__NSCFString bytes]: unrecognized selector sent to instance 0x6e4cbc0'
*** First throw call stack:
(0x12d0022 0x1781cd6 0x12d1cbd 0x1236ed0 0x1236cb2 0xce5f51 0xb447 0xaa89 0x1f2e330 0x1f2f439 0x908b9b24 0x908bb6fe)
terminate called throwing an exception(lldb) 

I have no control over the contents of the XML, but if it indeed contains incorrect information, then maybe I can talk to the webmaster.

I'm fine with displaying the é character as 'e' or '?' if that's what it takes.

Any advice on what causes this error and how to correct or bypass it is greatly appreciated.

Tx!

--GB

gbroekstg
  • 1,055
  • 1
  • 10
  • 19

2 Answers2

0

I found a bypass (not a solution) to this problem. To get from the NSURL to the NSData, I have used the following code:

        NSError *error;
        NSString *xmlText = [NSString stringWithContentsOfURL:url encoding:NSASCIIStringEncoding error:&error];
        xmlText = [xmlText stringByReplacingOccurrencesOfString:@"é" withString:@"e"];
        NSData *xml = [xmlText dataUsingEncoding:NSASCIIStringEncoding];

So basically, I

  • Converted the NSURL in an NSString
  • Edited that string by replacing the 'special' characters
  • Used the edited string to create the NSData

I also found that I had to use NSASCIIStringEncoding instead of NSUTF8StringEncoding (which is what the XML specifies, but which failed earlier).

Anyway, suggestions to really solve the problem are still welcome, but this bypass works for me for the time being...

gbroekstg
  • 1,055
  • 1
  • 10
  • 19
0

In case of invalid utf-8 characters, it's better to 'clean' the data received from your source, before handling it over to NSXMLParser. Converting data to ascii, as often suggested when dealing with the NSXMLParser, is not always a good idea, for example when your source contains Cyrillic characters.

In Swift it can be done like this:

var buffer = data // malformed UTF-8
buffer.append(0 as UInt16)
let cleanBuffer = buffer.withUnsafeBytes { (p: UnsafePointer<CChar>) in String(cString: p) }
let cleanString = cleanBuffer.replacingOccurrences(of: "\u{FFFD}", with: String())
let cleanData = clean.data(using: String.Encoding.utf8) {
   self.parser = XMLParser(data: cleanData) // Assuming a 'parser' variable is already present
}

Based on Cleaning malformed UTF8 strings

Ely
  • 8,259
  • 1
  • 54
  • 67