I am having a problem replacing line breaks inside all <pre>
elements in a given HTML using Jsoup.
Here is what I tried so far, and the problem I am facing.
I am trying to replace all the \n
characters with <br>
for the innerHtml in all the <pre>
tags only. I want to leave the rest of the content as it is.
The code is:
String body = "<p>This is the output:</p>\n<pre class=\"lang-xml prettyprint prettyprinted\">\n<code><span class=\"dec\"><!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"></span><span class=\"pln\">\n</span><span class=\"tag\"><HTML></span><span class=\"pln\">\n </span><span class=\"tag\"><HEAD></span><span class=\"pln\">\n </span><span class=\"tag\"><META</span><span class=\"pln\"> </span><span class=\"atn\">http-equiv</span><span class=\"pun\">=</span><span class=\"atv\">\"Content-Type\"</span><span class=\"pln\"> </span><span class=\"atn\">content</span><span class=\"pun\">=</span><span class=\"atv\">\"text/html; charset=iso-8859-1\"</span><span class=\"tag\">></span><span class=\"pln\">\n </span><span class=\"tag\"><TITLE></span><span class=\"pln\">GeteBayOfficialTime</span><span class=\"tag\"></TITLE></span><span class=\"pln\">\n </span><span class=\"tag\"></HEAD></span><span class=\"pln\">\n </span><span class=\"tag\"><BODY></span><span class=\"pln\">\n\n* About to connect() to api.ebay.com port 443 (#0)\n* Trying 66.135.211.100... * Timeout\n* Trying 66.135.211.140... * Timeout\n* Trying 66.211.179.150... * Timeout\n* Trying 66.211.179.180... * Timeout\n* Trying 66.135.211.101... * Timeout\n* Trying 66.211.179.148... * Timeout\n* connect() timed out!\n* Closing connection #0\n</span><span class=\"tag\"><P></span><span class=\"pln\">Error sending request</span></code></pre>";
log.info("printing before creating a Jsoup Doc "+ body);
Document bodyDom = Jsoup.parse(body);
log.info("printing after creating a Jsoup Doc "+ bodyDom.html());
Elements preTags = bodyDom.getElementsByTag("pre");
for (Element pre : preTags) {
pre.html(pre.html().replaceAll("(\r\n|\n)", "<br />"));
log.info("Pre element with linebreaks replaced -" + pre);
}
body = bodyDom.html();
Here is the log, seems like the html source is losing newline characters once I parse the Jsoup document. :
**2013-12-10 10:14:59 INFO FormattingTest:166** - printing before creating a Jsoup Doc <p>This is the output:</p>
<pre class="lang-xml prettyprint prettyprinted">
<code><span class="dec"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"></span><span class="pln">
</span><span class="tag"><HTML></span><span class="pln">
</span><span class="tag"><HEAD></span><span class="pln">
</span><span class="tag"><META</span><span class="pln"> </span><span class="atn">http-equiv</span><span class="pun">=</span><span class="atv">"Content-Type"</span><span class="pln"> </span><span class="atn">content</span><span class="pun">=</span><span class="atv">"text/html; charset=iso-8859-1"</span><span class="tag">></span><span class="pln">
</span><span class="tag"><TITLE></span><span class="pln">GeteBayOfficialTime</span><span class="tag"></TITLE></span><span class="pln">
</span><span class="tag"></HEAD></span><span class="pln">
</span><span class="tag"><BODY></span><span class="pln">
* About to connect() to api.ebay.com port 443 (#0)
* Trying 66.135.211.100... * Timeout
* Trying 66.135.211.140... * Timeout
* Trying 66.211.179.150... * Timeout
* Trying 66.211.179.180... * Timeout
* Trying 66.135.211.101... * Timeout
* Trying 66.211.179.148... * Timeout
* connect() timed out!
* Closing connection #0
</span><span class="tag"><P></span><span class="pln">Error sending request</span></code></pre>
**2013-12-10 10:14:59 INFO FormattingTest:168** - printing after creating a Jsoup Doc <html>
<head></head>
<body>
<p>This is the output:</p>
<pre class="lang-xml prettyprint prettyprinted">
<code><span class="dec"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"></span><span class="pln"> </span><span class="tag"><HTML></span><span class="pln"> </span><span class="tag"><HEAD></span><span class="pln"> </span><span class="tag"><META</span><span class="pln"> </span><span class="atn">http-equiv</span><span class="pun">=</span><span class="atv">"Content-Type"</span><span class="pln"> </span><span class="atn">content</span><span class="pun">=</span><span class="atv">"text/html; charset=iso-8859-1"</span><span class="tag">></span><span class="pln"> </span><span class="tag"><TITLE></span><span class="pln">GeteBayOfficialTime</span><span class="tag"></TITLE></span><span class="pln"> </span><span class="tag"></HEAD></span><span class="pln"> </span><span class="tag"><BODY></span><span class="pln"> * About to connect() to api.ebay.com port 443 (#0) * Trying 66.135.211.100... * Timeout * Trying 66.135.211.140... * Timeout * Trying 66.211.179.150... * Timeout * Trying 66.211.179.180... * Timeout * Trying 66.135.211.101... * Timeout * Trying 66.211.179.148... * Timeout * connect() timed out! * Closing connection #0 </span><span class="tag"><P></span><span class="pln">Error sending request</span></code></pre>
</body>
</html>
2013-12-10 10:14:59 INFO FormattingTest:174 - Pre element with linebreaks replaced - <pre class="lang-xml prettyprint prettyprinted"><code><span class="dec"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"></span><span class="pln"> </span><span class="tag"><HTML></span><span class="pln"> </span><span class="tag"><HEAD></span><span class="pln"> </span><span class="tag"><META</span><span class="pln"> </span><span class="atn">http-equiv</span><span class="pun">=</span><span class="atv">"Content-Type"</span><span class="pln"> </span><span class="atn">content</span><span class="pun">=</span><span class="atv">"text/html; charset=iso-8859-1"</span><span class="tag">></span><span class="pln"> </span><span class="tag"><TITLE></span><span class="pln">GeteBayOfficialTime</span><span class="tag"></TITLE></span><span class="pln"> </span><span class="tag"></HEAD></span><span class="pln"> </span><span class="tag"><BODY></span><span class="pln"> * About to connect() to api.ebay.com port 443 (#0) * Trying 66.135.211.100... * Timeout * Trying 66.135.211.140... * Timeout * Trying 66.211.179.150... * Timeout * Trying 66.211.179.180... * Timeout * Trying 66.135.211.101... * Timeout * Trying 66.211.179.148... * Timeout * connect() timed out! * Closing connection #0 </span><span class="tag"><P></span><span class="pln">Error sending request</span></code></pre>
Not sure what's wrong. This is working with another html source - "\nResponse :\n some thext \n \ndsjkhskjdh sdjhasjkdas \n"
Gets properly converted to -
Response :
some text
dsjkhskjdh sdjhasjkdas
Not sure why the first sample doesnt !!