1

Jsoup.parse(String html) stops working. I have an application when i use jsoup for few times to parse different pages, but when i want to parse a big page, jsoup just stops and that is all. Does it have a limit or a maximum size of a page?

java.lang.OutOfMemoryError
at java.lang.Object.internalClone(Native Method)
at java.lang.Object.clone(Object.java:82)
at java.lang.AbstractStringBuilder.append0(AbstractStringBuilder.java:172)
at java.lang.StringBuilder.append(StringBuilder.java:224)
at org.jsoup.parser.Tokeniser.emit(Tokeniser.java:76)
at org.jsoup.parser.TokeniserState$1.read(TokeniserState.java:26)
at org.jsoup.parser.Tokeniser.read(Tokeniser.java:42)
at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:101)
at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:53)
at org.jsoup.parser.Parser.parse(Parser.java:24)
at org.jsoup.Jsoup.parse(Jsoup.java:44)
...

EDIT: I took the substring of a page for some thousand first characters and then it managed to parse it. So it seems that Jsoup has a limit of characters that it can manage.. Probably Datatype type is important here.

EDIT:, EDIT: After analysing a little about what could be an error and trying to write my own HTML parser, which led to a lots of stress, i found out that Dalvik VM assigns only 4,3 MB on the Heap, which i assume is different from pc to pc.. Gonna try to increase it..

  • Are you facing this problem on Android? If yes, then why didn't you include Android tag? You would probably have gotten yourself more replies... – Indrek Kõue Nov 11 '11 at 14:12
  • @SYLARRR Probably just forgot to include that one. Well, Jsoup is a java library, so it is more generic than Android development, so that means I would have targetted this question to more developers out there tan if I would just added the Android tag, this would be more specific case.. But i will include the Android tag, because I discuss the Android environment issues here, thanks :) –  Nov 11 '11 at 15:08
  • What is that limit of heap you're trying to say? 4,3MB or 4.3MB I don't understand. – Sudarshan Bhat Dec 07 '11 at 04:57
  • @Enigma i am trying to say that Dalvik VM assigns 4,3 MB on heap, I don't know about the real device –  Dec 07 '11 at 10:15

1 Answers1

0

Try getting the page content with another method like HttpClient and then call

Jsoup.parse(String html);
Ali Hashemi
  • 3,158
  • 3
  • 34
  • 48