I have been handed an XML file with instruction to read, edit, and write it using Jackson and Woodstox (as per the recommendation in the documentation). For the most part this has not been too hard; They're both pretty darn good at what it does. At this point, though, I have run into a problem:
My XML objects do themselves contain XML objects. For example:
<XMLObject>
<OuterObject attributeOne="1" attributeTwo="2" attributeThree=">">
<InnerObject><NestedObject>Blah</NestedObject></InnerObject>
</OuterObject>
<OuterObject attributeOne="11" attributeTwo="22" attributeThree="<">
<InnerObject><NestedObject>Blah</NestedObject></InnerObject>
</OuterObject>
<OuterObject attributeOne="111" attributeTwo="222" attributeThree="3" />
<XMLObject>
The moment that I read the XML file into my Jackson-annotated Java object, all of those instances of <
and >
are converted by Woodstox into <
and >
, respectively. When I write the object back out as an XML file, <
becomes <
but >
stays >
<XMLObject>
<OuterObject attributeOne="1" attributeTwo="2" attributeThree=">">
<InnerObject><NestedObject>Blah</NestedObject></InnerObject>
</OuterObject>
<OuterObject attributeOne="11" attributeTwo="22" attributeThree="<">
<InnerObject><NestedObject>Blah</NestedObject></InnerObject>
</OuterObject>
<OuterObject attributeOne="111" attributeTwo="222" attributeThree="3" />
<XMLObject>
The simplest version of my method that is endeavoring to read the file is as follows:
@RequestMapping("readXML")
public @ResponseBody CustomXMLObject readXML() throws Exception {
File inputFile = new File(FILE_PATH);
XmlMapper mapper = new XmlMapper();
CustomXMLObject value = mapper.readValue(inputFile, CustomXMLObject .class);
return value;
}
And my Jackson-annotated Java object would look something like this for the example that I gave above:
import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlProperty;
@JsonInclude(JsonInclude.Include.NON_NULL)
public class CustomXMLObject {
@JacksonXmlProperty(isAttribute=true)
private long attributeOne;
@JacksonXmlProperty(isAttribute=true)
private String attributeTwo;
@JacksonXmlProperty(isAttribute=true)
private String attributeThree;
@JacksonXmlProperty(localName = "InnerObject")
private String innerObject;
public long getAttributeOne() {
return attributeOne;
}
public void setAttributeOne(long attributeOne) {
this.attributeOne = attributeOne;
}
public String getAttributeTwo() {
return attributeTwo;
}
public void setAttributeTwo(String attributeTwo) {
this.attributeTwo = attributeTwo;
}
public String getAttributeThree() {
return attributeThree;
}
public void setAttributeThree(String attributeThree) {
this.attributeThree = attributeThree;
}
public String getInnerObject() {
return innerObject;
}
public void setInnerObject(String innerObject) {
this.innerObject = innerObject;
}
}
Finally, my dependencies look like this:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.jayway.jsonpath</groupId>
<artifactId>json-path</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-jaxb-annotations</artifactId>
<version>2.5.0</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-xml</artifactId>
<version>2.8.4</version>
</dependency>
<dependency>
<groupId>org.codehaus.woodstox</groupId>
<artifactId>woodstox-core-asl</artifactId>
<version>4.4.1</version>
</dependency>
This appears to be occurring due to Jackson's use of Woodstox' BufferingXmlWriter. This particular writer will intercept those characters and encode them, and there does not appear to be any way to circumvent that decision:
private final void writeAttrValue(String value, int len) throws IOException {
int inPtr = 0;
char qchar = this.mEncQuoteChar;
int highChar = this.mEncHighChar;
while(true) {
String ent = null;
while(true) {
if(inPtr >= len) {
return;
}
char c = value.charAt(inPtr++);
if(c <= 60) {
if(c < 32) {
if(c == 13) {
if(this.mEscapeCR) {
break;
}
} else {
if(c == 10 || c == 9 || this.mXml11 && c != 0) {
break;
}
c = this.handleInvalidChar(c);
}
} else {
if(c == qchar) {
ent = this.mEncQuoteEntity;
break;
}
if(c == 60) {
ent = "<";
break;
}
if(c == 38) {
ent = "&";
break;
}
}
} else if(c >= highChar) {
break;
}
if(this.mOutputPtr >= this.mOutputBufLen) {
this.flushBuffer();
}
this.mOutputBuffer[this.mOutputPtr++] = c;
}
if(ent != null) {
this.writeRaw(ent);
} else {
this.writeAsEntity(value.charAt(inPtr - 1));
}
}
}
So to sum up the problem at the end, I have been given an XML file. That XML file contains attributes and elements that, themselves, contain symbols (<
and >
) that have been encoded (<
and >
) so as not to break the XML. When Woodstox reads the file, instead of handing my Java object the actual string contained in the XML, it decodes the character. Upon writing, only <
is re-encoded as <
. This appears to be happening because Jackson is using Woodstox' BufferingXmlWriter, which does not seem to be configurable to avoid encoding these characters.
As a result, my question is the following:
Can I configure the Jackson object to use a Woodstox XML reader that will allow my to read and write the characters in my XML file without further encoding, or do I need to look into a different solution entirely for my needs?