0

I use Jsoup library.

After the execution of the following code:

Document doc = new Document(language);

File input = new File("filePath" + "filename.html");
PrintWriter writer = new PrintWriter(input, "UTF-8");

String contentType = "<%@ page contentType=\"text/html; charset=UTF-8\" %>";
doc.appendText(contentType);

writer.write(doc.toString());
writer.flush();
writer.close();

In the output html file I receive the following line of text:

&lt;%@ page contentType=&quot;text/html; charset=UTF-8&quot; %&gt;

instead of

<%@ page contentType="text/html; charset=UTF-8" %>

What could be the problem?

Dan
  • 393
  • 1
  • 4
  • 19

2 Answers2

1

Those are escape characters for preventing the browser from treating them as html tags. It's not a problem. It will render correctly when you open the page via a browser

Aswin
  • 541
  • 4
  • 13
0

Some problems here:

Document doc = new Document(language);

Don't do this. Use Jsoup.parse(...) instead.

<%@ page contentType="text/html; charset=UTF-8" %>

This is not HTML, and will probably not get parsed correctly.

Now, for your problem. You should use something like

Document document = Jsoup.parse(new ByteArrayInputStream(myHtmlString.getBytes(StandardCharsets.UTF_8)), "ISO-8859-1", BaseUrl);

Check this, this, and this for the outputSetting which you may need.

Community
  • 1
  • 1
Jonas Czech
  • 12,018
  • 6
  • 44
  • 65