0

I want to take a link (and grab its HTML) and only keep the part that is important, say the article. There are many HTML parsing libraries for Objective-C - hpple, for example - but I want to do more than just parse specific things, I need something that removes all the things that aren't part of the readable content. Kinda like what Instapaper, Readability, Pocket or Safari's Reader feature do.

What would be the best way to accomplish this in Objective-C/iOS?

Doug Smith
  • 29,668
  • 57
  • 204
  • 388
  • This seems not a very specific question, anyway I suggest you to Read the documentation of NSXMLParser or to use a 3rd part library that parses the XML (HTML is a subset of XML). Once you loaded the HTML document the only thing you can do is parse every tag and remove the content from the document and the tag itself. I believe there isn't anything auto-magical that does what you want to do ;) – Lolloz89 Mar 22 '13 at 19:06

1 Answers1

1

I'm not sure if there's a way in Objective-C, but Readability had an open source Javascript implementation that got at the contents of web pages. See also this answer and the linked code (called "boilerplate") which may help you. It seems to be in Java though.

For just getting links, use NSDataDetector to scan the text.

Community
  • 1
  • 1
nevan king
  • 112,709
  • 45
  • 203
  • 241