I'm writing an android app that takes relevant data from a website and presents it to the user (html scraping). The application downloads the source code and parses it, looking for relevant data to store in objects. I actually made a parser using JSoup, but it turned out to be really slow in my app. Also, these libraries tend to be rather large, and I want my app to be lightweight.
The webpages I'm trying to parse all have a similar structure and I know exactly what tags I'm looking for. So I figured I might as well download the source code and read it line by line, looking for the relevant data, using String.equals
. For example, if the html would look like this:
<textTag class="text">I want this text</textTag>
I would parse it using methods like:
private void interpretHtml(String s){
if(s.startsWidth("<textTag class=\"text\"")){
String text = s.substring(22, s.length() - 10);
}
}
However, I have very little knowledge about setting up connections (I've seen people use HttpGet
s, but I'm not entirely sure how to get data from that). I've searched for quite some time looking for information on how to parse like this, but most people often resort to using libraries like JSoup, SAX, etc. to do parsing.
Does anyone happen to have some information on how to do parsing like this, maybe an example? Or is it a bad idea to parse source code in this way? Please give me your opinion.
Thank you for your time.