I want to remove ONLY html tags from text with JSOUP. I used solution from here (my previous question about JSOUP) But after some checkings I discovered that JSOUP gets JAVA heap exception: OutOfMemoryError for big htmls but not for all. For example, it fails on html 2Mb and 10000 lines. Code throws an exception in the last line (NOT on Jsoup.parse):
public String StripHtml(String html){
html = html.replace("<", "<").replace(">", ">");
String[] tags = getAllStandardHtmlTags;
Document thing = Jsoup.parse(html);
for (String tag : tags) {
for (Element elem : thing.getElementsByTag(tag)) {
elem.parent().insertChildren(elem.siblingIndex(),elem.childNodes());
elem.remove();
}
}
return thing.html();
}
Is there a way to fix it?