0

I need to transform file from ISO-8859-2 charset to UTF-8.

My code is:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = builder.parse(file);


        DOMSource domSource = new DOMSource(doc);

        String fileName2 = UUID.randomUUID().toString() + "222";
        Writer out = new OutputStreamWriter(new FileOutputStream("/Users/user/Kohana/" + fileName2 + ".xml"), "UTF8");

        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.transform(domSource, new StreamResult(out));

But the problem is, that after transform, the file is still ISO-8859-2.

What am i doing wrong ?

Ilkar
  • 2,113
  • 9
  • 45
  • 71
  • is it `UTF8` or `UTF-8`? – Алексей Jan 10 '14 at 23:04
  • I want that it will be UTF-8. but after you question i've checked UTF8 and my file is still ISO-... – Ilkar Jan 10 '14 at 23:06
  • Does the original XML declare its encoding as ISO-8859-2? – erickson Jan 10 '14 at 23:06
  • Recode it outside of xslt, simple by Reading the file in and writing it out again with the appropriate encodings. The content MUST match the ` – Thorbjørn Ravn Andersen Jan 10 '14 at 23:08
  • Related: http://stackoverflow.com/questions/15592025/transformer-setoutputpropertyoutputkeys-encoding-utf-8-is-not-working In some versions of the JDK it seems that there's a bug surrounding setting the encoding to UTF-8. This *may* be something to do with your issue - try upgrading to the latest JDK if you haven't already. – Michael Berry Jan 10 '14 at 23:09
  • 1
    @Ilkar what i mean is that you create a new `OutputStreamWriter` to be `UTF8` but i think it should be `UTF-8`. – Алексей Jan 10 '14 at 23:10

1 Answers1

2
  1. read xml using "ISO-8859-2" Reader / Stream
  2. use xslt transform to add <?xml version=“1.0” encoding=“utf-8”?> header
  3. use Writer / Stream constructed with "UTF-8" encoding
MGorgon
  • 2,547
  • 23
  • 41