0

I'm using Primefaces <p:fileUpload> component for file selection by users.
In that, I'm using the fileUploadListener attribute that uses a custom method to read XML files and parse them using DOMParser. My file reading and parsing code boils down to this much:

InputStream inputStream = new FileInputStream("C:\\test\\test_ansi.xml");
Reader inputStreamReader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
InputSource is = new InputSource(new BufferedReader(inputStreamReader));
DOMParser parser = new DOMParser();
parser.parse(is);
Document doc = parser.getDocument();

I have a need to convert all user-selected XMLs to UTF-8 before processing them further. If I try the above code with a file encoded as ANSI, it works fine. But if the file is encoded as UTF-8, I'm getting the below error:

oracle.xml.parser.v2.XMLParseException; lineNumber: 1; columnNumber: 1; Start of root element expected.
    at oracle.xml.parser.v2.XMLError.flushErrors(XMLError.java:233)
    at oracle.xml.parser.v2.XMLError.error(XMLError.java:133)
    at oracle.xml.parser.v2.XMLError.error(XMLError.java:171)
    at oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingParser.java:280)
    at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingParser.java:241)
    at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:124)

For sample UTF-8 files, I'm just saving the file with encoding as UTF-8 in Notepad. Can anyone please help me understand what's going wrong?

Tatha
  • 131
  • 1
  • 13

1 Answers1

2

It looks like encoded as UTF-8 file contains Unicode BOM character at the beginning and that's why the parsing fails.

You can wrap FileInputStream using BOMInputStream like this

InputStream inputStream = new FileInputStream("C:\\test\\test_ansi.xml");
BOMInputStream bomInputStream = new BOMInputStream(inputStream, false);
Reader inputStreamReader = new InputStreamReader(bomInputStream, StandardCharsets.UTF_8);

It constructs a new BOM InputStream that detects a ByteOrderMark.UTF_8 and excludes it.

Please, find more details here.

olgacosta
  • 1,076
  • 4
  • 15
  • 36