2

When using the NSXMLParser (indirectly through Michael Waterfalls MWFeedParser library) and parsing the following RSS feed:

http://qdb.us/qdb.xml?action=latest

NSURL *feedURL = [NSURL URLWithString:@"http://qdb.us/qdb.xml?action=random"];
self.feedParser = [[MWFeedParser alloc] initWithFeedURL:feedURL];
self.feedParser.delegate = self;
self.feedParser.feedParseType = ParseTypeFull; // Parse feed info and all items
self.feedParser.connectionType = ConnectionTypeAsynchronously;
[self.feedParser parse];

I receive back an invalid formatted xml document that appears to be an illegal character in the feed.

http://validator.w3.org/check?uri=http%3A%2F%2Fqdb.us%2Fqdb.xml%3Faction%3Dlatest&charset=utf-8&doctype=Inline&group=0&user-agent=W3C_Validator%2F1.1

I've tried changing the documents encoding from ISO-8859-1 to UTF-8 but the problem still occurs.

How do I identify the illegal character and then how do I make it so parsing the RSS feed won't fall over when encountering these illegal characters?

References: (links I've already investigated)

HTML character decoding in Objective-C / Cocoa Touch

https://stackoverflow.com/users/106244/michael-waterfall

Community
  • 1
  • 1
Ben Priebe
  • 365
  • 5
  • 11
  • DTHTMLParser has nearly the exact same implementation of NSXMLParser but will allow 'illegal characters'. – endy Apr 26 '12 at 01:23

2 Answers2

0

I don't know how to ignore illegal character, but you might consider to do some regex correction to remove them before parsing, but I suggest to use killxml instand of nsxmlparser, which could be ok with illegal character, here is "How To Choose The Best XML Parser for Your iPhone Project"

Yuwen Yan
  • 4,777
  • 10
  • 33
  • 63
0

I found something like this while parsing EPG Data grabbed from the REST API of my Enigma2 receiver. In this case one service was pushing EPGInfo with the illegal character 0x05.
I have implemented a cleanup method for incoming NSData. This is the poor man's way to filter these 0x05 bytes from the NSData I receive from NSURLSession before passing it to the parser:

-(NSData *)DataCleaned:(NSData *)data {
   NSData *clean = nil;
   const char *old = (const char *)data.bytes;
   char *flt = (char *)calloc( data.length, sizeof( char ) );
   NSInteger cnt = 0;
   for( NSInteger i = 0; i < data.length; i++ ) {
      if ( old[i] != 0x05 )
         flt[cnt++] = old[i];
   }
   clean = [NSData dataWithBytes:flt length:cnt];
   free( flt );
   return clean;
}

In my case, this solved the problem. But of course this requires to load the response into NSData prior to parsing it.

ssb
  • 36
  • 1