Parse UTF-8 BOM XML document with newDocumentBuilder.parse()

Question

I am trying to parse a document that has UTF-8 BOM encoding from a url but I am having problems getting my script to work and remove the first characters ï»¿ so that I can use JAXB on the document. I have tried;

Document k = factory.newDocumentBuilder().parse(new URL(url).openStream());

I have also tried;

String defaultEncoding = "UTF-8";
try {
    //InputStream inputStream = new FileInputStream(url);
    //BOMInputStream bOMInputStream = new BOMInputStream(inputStream);
    BOMInputStream bOMInputStream = new BOMInputStream(new URL(url).openStream());
    ByteOrderMark bom = bOMInputStream.getBOM();
    String charsetName = bom == null ? defaultEncoding : bom.getCharsetName();
    InputSource reader = new InputSource(new BufferedInputStream(bOMInputStream)); //, charsetName
    reader.setEncoding(charsetName);
    System.out.println ("Passed!");
    //use reader
    Document k = factory.newDocumentBuilder().parse(reader);
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    return null;
}

It just does not seem to work.

hello friend, look for this post I think that there are many solutions for your problem https://stackoverflow.com/questions/1835430/byte-order-mark-screws-up-file-reading-in-java — Dilnei Cunha, Oct 26 '17 at 02:30
What do you mean with "it does not seem to work"? Do you get an error? If yes, then what is the exact error message? Are you using `BOMInputStream` from Apache Commons IO? — Jesper, Oct 26 '17 at 06:59

Parse UTF-8 BOM XML document with newDocumentBuilder.parse()

0 Answers0