Lets create an XML file with two attribute values witch contain an extended unicode char
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(ERROR_XML), "UTF-8"))) {
XMLStreamWriter xmlStreamWriter = outputFactory.createXMLStreamWriter(writer);
xmlStreamWriter.writeStartDocument();
xmlStreamWriter.writeCharacters("\n");
xmlStreamWriter.writeStartElement("start");
xmlStreamWriter.writeAttribute("test1", "11");
xmlStreamWriter.writeAttribute("test2", "22");
xmlStreamWriter.writeEndElement();
xmlStreamWriter.writeEndDocument();
}
The generated file looks like this:
<?xml version="1.0" ?>
<start test1="11" test2="22"></start>
If this is read in again and the attribute values examined
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(ERROR_XML), "UTF-8"))) {
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(reader);
xmlStreamReader.nextTag();
if (XMLStreamReader.START_ELEMENT == xmlStreamReader.getEventType() &&
"start".equals(xmlStreamReader.getLocalName()))
{
System.out.println(xmlStreamReader.getAttributeValue(0));
System.out.println(xmlStreamReader.getAttributeValue(1));
}}
this will print
11
22
Astonishingly the second attribute value contains the extended unicode char 2 times!
Any following use of an extended char as attribute value will increase this count. In one case I received attribute values with 12000 identical characters instead of one. What is happening here?