0

I am currently trying to import selective headline from html content in my webview. I am looking at wide variety of options like json parsing or any hack will do. I was wondering if anyone has had experience with this or a brief idea on how to go about this? Here's my example: This is my html file content:

<div><h1><span class = "headline"> Some depressing title </span> <span class = "source" > ABCD </span> </h1> <br/> <span class = "body"> crappy body content which I do not need </span></div>

I just want to retrieve "headline" and "source" from this html in my webview, nothing else(not the body ). How do I go about defining a parameter to retrieve these? Any clues on how to do it?

Thanks!

Xaver Kapeller
  • 49,491
  • 11
  • 98
  • 86

1 Answers1

0

Step 1: get the HTML source from your WebView - see this question. You basically create a JS interface that extracts your HTML source to a Java String.

Step 2: Use an HTML Parser (for example JSOUP) to parse the JAVA String into a format that you can handle easily.

Step 3: Use the parser to extract your relevant information. Here, you could use getElementsByTag('span') to get all your spans, then filter by class; or you could directly use getElementsByClass('healine') and getElementsByClass('source').

In general, you can retreive the HTML source and parse the DOM in all cases.

Edit: if you don't want to use a parser, you can extract your information by using searches on the HTML source string (finding the correct classes, then finding the indexes of '<' and '>' caracters to parse the information. This way is harder, less efficient, and less flexible, but it can be done.

Community
  • 1
  • 1
Robin Eisenberg
  • 1,836
  • 18
  • 26
  • That seems promising, but I have never used JSoup before. Is this the only way to do it? Is there any other easier hack I can use? –  Apr 30 '15 at 15:38
  • I think JSoup seems scary because it is a third-party library, but it gets you your info in three lines of code: Document doc = Jsoup.parse(html); doc.getElementsByClass('healine'); and doc.getElementsByClass('source'). If this is the only HTML parsing you will be doing in your application, and know the format of the page in advance, you could just perform searches on the String to extract your data. Though using an HTML parser will be cleaner, more efficient, and more flexible. Adding the parser is as simple as putting a .jar file in a certain folder of your project. – Robin Eisenberg Apr 30 '15 at 15:40
  • actually I was also thinking.Is it possible to limit only the first 2 lines in a webview programmatically? That will also solve my issue. I just want to display the first 2 lines, more like maxlines =2 ,something like that? –  May 01 '15 at 17:08
  • Yes, you can intercept the source, then keep only the first two lines (or every piece of text until you reach the , then call webview.loadData(yourNewHTMLSourceString, "text/html", null); – Robin Eisenberg May 04 '15 at 07:34
  • How do I intercept the source until ? –  May 04 '15 at 10:14
  • Sorry, I'm not going to code the whole thing for you. String search operations (or, regular expressions) will do, but I still maintain that you are making this complicated because you don't want to use a ready-made parser. JSOUP remains the simpler option. Also, please mark this answer as 'accepted' if I answered your question. – Robin Eisenberg May 04 '15 at 10:17