0

I have an SoapMesagge in XML format which contain Chinese char.

<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<new:NewOperation xmlns:new="http://www.example.org/NewWSDLFile/">
    <in>4)  软件应安全、。</in>
</new:NewOperation>
</soapenv:Body>
</soapenv:Envelope>

To parse this I wrote the code below in JAVA where soapMessage is my message

ByteArrayInputStream is = new ByteArrayInputStream(soapMessage.getBytes());
InputStreamReader isr = new InputStreamReader(is,"UTF-8);
InputSource source=new InputSource(isr);
SAXParser parser = new SAXParser();
parser.parse(source);

It is not able to parse chinese char and throwing below error,please help me to solve this issue.

Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

I have tried with Dom Parser as well.

GAlexMES
  • 335
  • 2
  • 20
Naveen Suryawanshi
  • 513
  • 1
  • 4
  • 15
  • 1
    Check that the XML doesn't have a BOM : https://en.wikipedia.org/wiki/Byte_order_mark . – Arnaud Mar 13 '17 at 08:19
  • HI Thanks for update,I have tried using below code `String s = soapMessage.replaceFirst("^\uFEFF", "");` but my all chinese char getting changed into ??? this kind of string. ` 4) ????? ` – Naveen Suryawanshi Mar 13 '17 at 08:48
  • Where do you see the `????` , in the console ? Maybe it is unable to print those characters. Try outputting the values in a new file and check its content, or simply print the int value of each read character to ensure they aren't the `?` character . – Arnaud Mar 13 '17 at 08:57
  • It's possible that the encoding of your SOAP message is not UTF-8. Save to file and open in binary editor to verify the presence of UTF-8 BOM (or BOMs from other UTF encodings). If you can, remove it manually. If you cannot modify the stream, see below answer for removing BOMs dynamically: http://stackoverflow.com/questions/1835430/byte-order-mark-screws-up-file-reading-in-java/1835529#1835529 – diginoise Mar 13 '17 at 09:25

1 Answers1

0

Can you please check the below link, one answered is mentioned already, may help you.

parsing chinese characters in java showing weird behaviour

And I think your code should fail at compile time with below error also:

Code: SAXParser parser = new SAXParser();

Error: Cannot instantiate the type SAXParser

because SAXParser is an abstract class you can't instantiate directly:

public abstract class javax.xml.parsers.SAXParser
Community
  • 1
  • 1
Ashu Phaugat
  • 632
  • 9
  • 23