I'm using Jsoup's parseBodyFragment()
and parse()
methods to work with blocks of code made up of script, noscript, and style tags. The goal isn't to clean them - just to select()
, analyze, and output them. The select()
portion works really well.
However, the issue is that it's automatically encoding the url parameters of src attributes. So, when the input is this:
<noscript>
<img height="1" width="1" style="display:none;" alt="" src="https://something.orother.com/i/cnt?txn_id=123&p_id=123"/>
</noscript>
I end up with this, returned from Jsoup, via the outerHTML()
method:
<noscript>
<img height="1" width="1" style="display:none;" alt="" src="https://something.orother.com/i/cnt?txn_id=123&p_id=123"/>
</noscript>
The issue being the standard ampersand (&) in the url parameter is being encoded and output as &
. Is there a way to disable this?
I'm looking for a way to get the html of the selected element without modification. Thanks!
Update (2/23/2016): Clarified problem. Also, found an issue on the Github repo describing the problem: https://github.com/jhy/jsoup/issues/372. Looks like this might not be possible.