0

NSString *myText = [webView stringByEvaluatingJavaScriptFromString:@"document.documentElement.innerText"]; NSLog(@"my text -> %@",myText);

I get all the JavaScript for the webView but what i want is to save the body text only from the web page so can any body help me with some codes or any ideas thanks

Stephen Darlington
  • 51,577
  • 12
  • 107
  • 152

2 Answers2

1

Take the innerText of some element in the document, i.e. from body element.

adf88
  • 4,277
  • 1
  • 23
  • 21
  • txt for the replay i try it but i steel get all the link so i will do some search and if you can help me i will be grateful –  Aug 12 '10 at 10:25
  • txt again for replay but what i do (like vienna application) is to let the user enter the desired site and than get the rssfeed and store the body. so when i use document.body.innerText i get the body and the other links... i didn't found the solution yet but i steel looking for... –  Aug 16 '10 at 07:42
  • So what do you want to store exactly? – adf88 Aug 16 '10 at 07:56
  • i want to store all the text body; when i use - (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict i can store the description ,the title ,etc... the description is only few line from a big text so what i need is to go to original site and get all the text... when i use NSString *content=[webView stringByEvaluatingJavaScriptFromString:@"document.body.innerText"]; i get the text, the links, date... so can you help to get "only the original text " . –  Aug 16 '10 at 11:04
  • What is "only the original text"? You see, that's the problem, you must precisely specify what part of that page you want. "only the original text" is very confusing. So what do you want to store **exactly**? Generally speaking you must traverse DOM tree and get what you want. Is your app dedicated to some chosen website(s) or is it common purpose (if so, what is the purpose)? If you want to know how a certain website is built (what is the DOM tree) I suggest you use some browser debugger like Firebug for Firefox. – adf88 Aug 17 '10 at 06:49
  • if you go to this page:
    http://www.boston.com/news/nation/articles/2010/08/17/obama_sharpens_message_for_fall_election/
    you see the title,image and the text and also some links and some buttons (minuButton,plusButton,printButton...) so what i want is to take the text as string to copy into my app and use it
    –  Aug 17 '10 at 07:53
  • But tell me, what is your goal? Generic way to extract text of an article on boston.com site? – adf88 Aug 17 '10 at 11:33
  • i give you an example... my app let the user decide from where he want to read the news: the user put the site and the app retrieve the title,text and the picture and put into my app with my interface design... –  Aug 17 '10 at 12:10
0

It sounds like you want to get the text of the document excluding the tags.

If the page you're visiting uses JQuery, you could simply use $(body).text() to achieve this.

If not, you may need to strip off the tags with a regular expression yourself. This post seems to have an answer for this problem.

Community
  • 1
  • 1
William Niu
  • 15,798
  • 7
  • 53
  • 93