1

I am trying to parse an XML string to a Java object using fasterxml.jackson.xml.XmlMapper.

The problem is that the XML string contains the character '&'.

I am getting an exception thrown

Exception in thread "main" com.fasterxml.jackson.databind.JsonMappingException: Unexpected character '&' in prolog; expected '<'.

Code

import java.util.Map;
import com.fasterxml.jackson.dataformat.xml.XmlMapper;

public class MyProblem {
   public static void main(String[] args) {
      XmlMapper = xmlMapper = new XmlMapper();
      String myXML = "<cookies>Chocolate&Butter cocunut</cookies>";  
      Map<String, String> myTester = xmlMapper.reader().readValue(myXML, Map.class);
   }
}

I was expecting it to work when I perform a System.out.println(myTester);

After reading XmlMapper's documentation, I believe there is a property I can set that I can use to override deserialization functionalities.

If I need to escape these special characters, how to do?

hc_dev
  • 8,389
  • 1
  • 26
  • 38
Kyle
  • 13
  • 3
  • 1
    Read about [XML prolog](https://www.w3.org/TR/xml/#sec-prolog-dtd), kind of a header containing XML-version information, etc. like ``. It is missing in your given XML string. – hc_dev Dec 13 '22 at 22:03

1 Answers1

1

Because of the special role of ampersand character in XML it must be

  • either enclosed as CDATA "<cookies><![CDATA[Chocolate&Butter cocunut]]></cookies>"
  • or as HTML-entity "<cookies>Chocolate&amp;Butter cocunut</cookies>"

Both would be valid XML strings that Jackson and the underlying Woodstox can parse.

See also XML Spec, 2.4 Character Data and Markup:

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings " & " and " < " respectively. The right angle bracket (>) may be represented using the string " > ", and MUST, for compatibility, be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.

Related questions:

hc_dev
  • 8,389
  • 1
  • 26
  • 38