Possible Duplicate:
Convert Word doc to HTML programmatically in Java
I have a program that is taking a .docx file and opening as an .html file but when converting to html all I get is unreadable strings. I am needing the html of this file as I need to parse it later. When I use the method below to open the file I get unreadable text such as : úL]iN?#tBd!?^ý ?e"0©?®??AäúsIp?¸ü?D?ÂÓâ¨\Dâ>½??Eâcr&Æl\Fâÿ2qJ?U ??IúK&þIb
FileInputStream fileInput = null;
BufferedInputStream myBuffer = null;
DataInputStream dataInput = null;
fileInput = new FileInputStream(selectedFile);
myBuffer = new BufferedInputStream(fileInput);
dataInput = new DataInputStream(myBuffer);
StringBuilder nHtmlText = new StringBuilder();
while (dataInput.available() != 0) {
System.out.println(dataInput.readLine());
nHtmlText.append(dataInput.readLine());
}
htmlText = nHtmlText.toString();
Is there someway to get a clean readable html file for parsing and saving out of this?