3

I'm parsing some HTML with NSXMLParser and it hits a parser error anytime it encounters an ampersand. I could filter out ampersands before I parse it, but I'd rather parse everything that's there.

It's giving me error 68, NSXMLParserNAMERequiredError: Name is required.

My best guess is that it's a character set issue. I'm a little fuzzy on the world of character sets, so I'm thinking my ignorance is biting me in the ass. The source HTML uses charset iso-8859-1, so I'm using this code to initialize the Parser:

NSString *dataString = [[[NSString alloc] initWithData:data encoding:NSISOLatin1StringEncoding] autorelease];
NSData *dataEncoded = [[dataString dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES] autorelease];
NSXMLParser *theParser = [[NSXMLParser alloc] initWithData:dataEncoded];

Any ideas?

Silromen
  • 1,131
  • 2
  • 10
  • 18
  • 2
    You're parsing HTML with an XML parser? How is that ever going to work? (Unless it's well-formed XHTML and doesn't use the HTML entity set.) Either way, a bare ampersand is invalid in both HTML and XML, so you'd need to seek out a parser for real-world broken-HTML, which is a much, much harder job than XML parsing. – bobince Nov 12 '09 at 00:49

3 Answers3

7

To the other posters: of course the XML is invalid... it's HTML!

You probably shouldn't be trying to use NSXMLParser for HTML, but rather libxml2

For a closer look at why, check out this article.

Benjamin Cox
  • 6,090
  • 21
  • 19
  • Okay, then. Wrong tool for the job? Thanks for the tip. I may have to do that. – Silromen Nov 12 '09 at 00:55
  • 1
    Great point about the HTML, the NSXMLParser part threw me off. libxml2 seems like a very reasonable alternative. See this previous SO article: http://stackoverflow.com/questions/405749/parsing-html-on-the-iphone – Epsilon Prime Nov 12 '09 at 00:59
2

Are you sure you have valid XML? You are required to have special characters like & escaped, in the raw XML file you should see &

Kendall Helmstetter Gelner
  • 74,769
  • 26
  • 128
  • 150
0

Encoding the Data through a NSString worked for me, anyway you are autoreleasing an object that was not allocated by yourself (dataUsingEncoding), so it crashes, the solution is :

NSString *dataString = [[NSString alloc] initWithData:data
                             encoding:NSISOLatin1StringEncoding];

NSData *dataEncoded = [dataString dataUsingEncoding:NSUTF8StringEncoding 
                                     allowLossyConversion:YES];

[dataString release];

NSXMLParser *theParser = [[NSXMLParser alloc] initWithData:dataEncoded];
Krishnabhadra
  • 34,169
  • 30
  • 118
  • 167
Jerome
  • 609
  • 1
  • 5
  • 18