i am trying to parse an MHT-Document using Jsoup (Version: 1.7.3). The goal is to open two files and merge them together (joining head and body) to get one complete file. But firstly i got problems parsing the mht file because the parsed result has an significant lag of information and can´t be opened after parsing. What I did is the following:
- Create a mht file using Word (containing one image and some text)
- Parse it to String using Jsoup
- Write the string to a file
- Open the file and the file is broken
I used the following code:
private static final String USED_CHARSET = "windows-1252";
private static final String PATH = "C:\\Test\\";
private static final Charset CHARSET = Charset.forName(USED_CHARSET);
@Test
public void test() throws IOException {
Document doc = Jsoup.parse(new File(PATH, "sourceMht.mht"),
USED_CHARSET);
writeDoc(new File(PATH, "parsedMht.mht"), doc.html());
}
private void writeDoc(File file, String html) throws IOException {
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(file), CHARSET));
try {
out.write(html);
} finally {
out.flush();
out.close();
}
}
Thanks for your help.