0

I am parsing XML via the following java code:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();     
XPath xPath = XPathFactory.newInstance().newXPath();
SimpleErrorHandler simpleErrorHandlerObj = new SimpleErrorHandler(RequestDoc);
builder.setErrorHandler(simpleErrorHandlerObj);
InputSource is = new InputSource(new StringReader(CXMLHandlerObj.incoming_cxml));
domdoc = builder.parse(is);

The XML is not getting parsed and is getting an error.

This is an excerpt of the XML for which I am getting an error:

  <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.048/cXML.dtd">
<cXML payloadID="x" timestamp="2021-04-15T23:59:59+00:00" version="1.2.048"><Header>
      <From>
         <Credential
            domain="NetworkId">
            <Identity>x-T</Identity>
         </Credential>
         
         
      <Credential
            domain="SystemID"><Identity>x</Identity></Credential></From>
      <To>
         
         <Credential
                domain="NetworkID"><Identity>x-T</Identity></Credential><Correspondent>
            <Contact
                    role="correspondent">
               <Name
                        xml:lang="EN">x</Name>
               <PostalAddress>
                  <Street>x</Street>
                  <City>x</City>
                  <Country
                    isoCountryCode="NL">x</Country>
               </PostalAddress>
               <Email
            name="routing">x@gmail.com</Email>
            </Contact>
         </Correspondent>
         XML is continuing..

When parsing the XML, I am getting these errors:

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)

It seems the XML has whitespaces. In my Java code I am already clearing whitespaces via this code:

content = content.replaceAll(">[\\s\r\n]*<", "><");

When sending this XML from the original source, Sap Ariba (Website portal) then I get the errors. When I download the XML, copy it with whitespaces and send it via Postman tool then it get parsed. It could be that there also be a whitespace in front of the XML but I don't know. I am already in contact with Sap Ariba but is there also a way to fix this via java code?

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
Nuri Ensing
  • 1,899
  • 1
  • 19
  • 42
  • 2
    Your regex doesn't trim the whitespace at the beginning of the file. "Could also be?" There could; it's right there. – Dave Newton Apr 20 '21 at 13:55
  • Can you check, if you have indeed a whitespace before the `` tag, or is this just a copy+paste mishap in this question? – dunni Apr 20 '21 at 13:55
  • 2
    Open the xml file with a hex editor. See if there are some funny looking hex prefixes. https://en.wikipedia.org/wiki/Byte_order_mark – Marek Puchalski Apr 20 '21 at 14:04
  • 1
    Your regex, besides being dangerous, requires a starting `>`, which wouldn't be present before the prolog. See duplicate links for further details. – kjhughes Apr 20 '21 at 14:12
  • See also: https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream – JohannesB Apr 20 '21 at 14:32

0 Answers0