1

Everything is okay when I read the data from webpage using InputStreamReader. I have problem with parsing data to DocumentHTML.

Main reason is that the HTML script has some special characters which are used incorrectly. There is an & sign twice ( "&&" ) and I believe that is causing the code to crash.

My code looks like this:

URL url = new URL(PageUrl);
URLConnection conn = url.openConnection();
// ... omitted ...

// parsing
HTMLDocument doc = (HTMLDocument)db.parse(conn.getInputStream());

Since I am making an Android application, I don't use standard parsing functions since the DocumentHTML object is going to be too large.

I found many existing examples of parsing HTML like using jsoup but they are not what I want.

I want to write my own code for parsing so that the HTMLDocument object will be kept small.

vvohra87
  • 5,594
  • 4
  • 22
  • 34
user1282256
  • 183
  • 1
  • 6
  • 16

1 Answers1

0

Why dont you use all the available Html parsers that are available in java? They have community support they so are the best option.

Open Source HTML Parsers in Java

Carlos Landeras
  • 11,025
  • 11
  • 56
  • 82
  • The main reason is if I use the available Html parsers that are available in java the size of HTMLDocument object is going to be in mega bites and it is too big because the android app is going to work slowly. If I will write my own code, size of HTMLDocument object is going to be in kilo bites and it is going to have right size to support android app. It is going to work much faster. – user1282256 Nov 20 '12 at 22:49
  • Here you have examples of coded parsers inside the application. I hope it helps: -> http://stackoverflow.com/questions/8480130/parsing-html-in-java-for-an-android-app – Carlos Landeras Nov 20 '12 at 22:53