1

I need to read a HTML file and search for some tags in it. Based on the results, some tags would need to be removed, other ones changed and maybe refining some attributes — to then write the file back.

Is NSXMLDocument the way to go? I don't think that a parser is really needed in this case, it could even mean more work. And I don't want to touch the entire file, all I need to do is to load the file in memory, change some things, and save it again.

Note that, I'll be dealing with HTML, and not XHTML. Could that be a problem for NSXMLDocument? Maybe some unmatched tags or un-closed ones could make it stop working.

sidyll
  • 57,726
  • 14
  • 108
  • 151

3 Answers3

4

NSXMLDocument is the way to go. That way you can use Xpath/Xquery to find the tags you want. Bad HTML might be a problem but you can set NSXMLDocumentTidyHTML and it should be OK unless it's really bad.

Joshua Smith
  • 6,561
  • 1
  • 30
  • 28
1
NSRange startRange = [string rangeOfString:@"<htmlTag>"];
NSRange endRange = [string rangeOfString:@"</htmlTag>"];
NSString *subStr = [string subStringWithRange:NSMakeRange(startRange.location+startRange.length, endRange.location-startRange.location-startRange.length)];
NSString *finalStr = [string stringByReplacingOccurencesOfString:substr];

and then write finalstr to the file.

This is what I would do, note that I don't exactly know what the advantages of using NSXMLDocument would be, this should do it perfectly.

Antwan van Houdt
  • 6,989
  • 1
  • 29
  • 52
1

NSXMLDocument will possibly fail, due to the fact that HTML pages are not well formed, but you can try with NSXMLDocumentTidyHTML/NSXMLDocumentTidyXML (you can use them both to improve results) as outlined here and also have a look a this for tan approach at modifying the HTML.

Community
  • 1
  • 1
sergio
  • 68,819
  • 11
  • 102
  • 123