35

I am trying to fetch the below xml from db using a java method but I am getting an error

Code used to parse the xml

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();

InputSource is = new InputSource(new ByteArrayInputStream(cond.getBytes()));

Document doc = db.parse(is);

Element elem = doc.getDocumentElement();

// here we expect a series of <data><name>N</name><value>V</value></data>
NodeList nodes = elem.getElementsByTagName("data");

TableID jobId = new TableID(_processInstanceId);
Job myJob = Job.queryByID(_clientContext, jobId, true);

if (nodes.getLength() == 0) {
    log(Level.DEBUG, "No data found on condition XML");

}

for (int i = 0; i < nodes.getLength(); i++) {
    // loop through the <data> in the XML

    Element dataTags = (Element) nodes.item(i);
    String name = getChildTagValue(dataTags, "name");
    String value = getChildTagValue(dataTags, "value");

    log(Level.INFO, "UserData/Value=" + name + "/" + value);

    myJob.setBulkUserData(name, value);
}

myJob.save();

The Data

<ContactDetails>307896043</ContactDetails>
<ContactName>307896043</ContactName>
<Preferred_Completion_Date>
</Preferred_Completion_Date>
<service_address>A-End Address: 1ST HELIERST HELIERJT2 3XP832THE CABLES 1 POONHA LANEST HELIER JE JT2 3XP</service_address>
<ServiceOrderId>315473043</ServiceOrderId>
<ServiceOrderTypeId>50</ServiceOrderTypeId>
<CustDesiredDate>2013-03-20T18:12:04</CustDesiredDate>
<OrderId>307896043</OrderId>
<CreateWho>csmuser</CreateWho>
<AccountInternalId>20100333</AccountInternalId>
<ServiceInternalId>20766093</ServiceInternalId>
<ServiceInternalIdResets>0</ServiceInternalIdResets>
<Primary_Offer_Name  action='del'>MyMobile Blue &#163;44.99 [12 month term]</Primary_Offer_Name>
<Disc_Reason  action='del'>8</Disc_Reason>
<Sup_Offer  action='del'>80000257</Sup_Offer>
<Service_Type  action='del'>A-01-00</Service_Type>
<Priority  action='del'>4</Priority>
<Account_Number  action='del'>0</Account_Number>
<Offer  action='del'>80000257</Offer>
<msisdn  action='del'>447797142520</msisdn>
<imsi  action='del'>234503184</imsi>
<sim  action='del'>5535</sim>
<ocb9_ARM  action='del'>false</ocb9_ARM>
<port_in_required  action='del'>
</port_in_required>
<ocb9_mob  action='del'>none</ocb9_mob>
<ocb9_mob_BB  action='del'>
</ocb9_mob_BB>
<ocb9_LandLine  action='del'>
</ocb9_LandLine>
<ocb9_LandLine_BB  action='del'>
</ocb9_LandLine_BB>
<Contact_2>
</Contact_2>
<Acc_middle_name>
</Acc_middle_name>
<MarketCode>7</MarketCode>
<Acc_last_name>Port_OUT</Acc_last_name>
<Contact_1>
</Contact_1>
<Acc_first_name>.</Acc_first_name>
<EmaiId>
</EmaiId>

The ERROR

 org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.

I read in some threads it's because of some special characters in the xml. How to fix this issue ?

Ashish Aggarwal
  • 3,018
  • 2
  • 23
  • 46
shaiksha
  • 993
  • 5
  • 17
  • 35
  • As you might have noticed your question is hard to understand without proper formatting. – Kai Mar 21 '13 at 11:06
  • 4
    It doesn't help that you haven't shown any code, but I suspect your XML file is basically invalid. I suspect it's claiming to be UTF-8 but *isn't* UTF-8. You should fix whatever's producing the bad file. – Jon Skeet Mar 21 '13 at 11:06
  • Definitely check the database; if correctly stored as UTF-8, check whether the java connector needs a setting to UTF-8 (is so for MySQL). If the database is wrongly defined take the effort to switch to UTF-8 as it is more versatile. – Joop Eggen Mar 21 '13 at 11:19
  • Hi, Can you someone tell where this will be defined in db – shaiksha Mar 22 '13 at 22:18
  • Can you show a hex-dump of the first few dozen bytes of the input? – Mike Samuel Apr 26 '13 at 18:41
  • Also, your data may be a valid XML document *fragment*, but it is definitely not a valid XML document because there are multiple elements at the root, while XML documents have to have exactly one root element so `db.parse` will fail even after you fix the immediate problem. – Mike Samuel Apr 26 '13 at 18:43
  • for followers, this error message may actually mean you have "weird bytes" at the *end* of your XML document, not necessarily the beginning. In my case it was some binary checksum stuff at the end that wasn't valid UTF-8 but the beginning all was :) – rogerdpack Mar 17 '16 at 21:43

14 Answers14

23

How to fix this issue ?

Read the data using the correct character encoding. The error message means that you are trying to read the data as UTF-8 (either deliberately or because that is the default encoding for an XML file that does not specify <?xml version="1.0" encoding="somethingelse"?>) but it is actually in a different encoding such as ISO-8859-1 or Windows-1252.

To be able to advise on how you should do this I'd have to see the code you're currently using to read the XML.

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • i am getting this error when trying to parse the xml using the below code – shaiksha Apr 26 '13 at 18:37
  • 4
    Thank you all i managed to fix the issue. By setting the encodin to ISO-8859-1 for before parsing DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); InputSource is = new InputSource(new ByteArrayInputStream(cond.getBytes())); is.setEncoding("ISO-8859-1"); Added this line to the existing code Document doc = db.parse(is); Element elem = doc.getDocumentElement(); – shaiksha May 01 '13 at 16:10
23
  1. Open the xml in notepad
  2. Make sure you dont have extra space at the beginning and end of the document.
  3. Select File -> Save As
  4. select save as type -> All files
  5. Enter file name as abcd.xml
  6. select Encoding - UTF-8 -> Click Save
Barani r
  • 2,119
  • 1
  • 25
  • 24
7

Try:

InputStream inputStream= // Your InputStream from your database.
Reader reader = new InputStreamReader(inputStream,"UTF-8");

InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");

saxParser.parse(is, handler);

If it's anything else than UTF-8, just change the encoding part for the good one.

LaGrandMere
  • 10,265
  • 1
  • 33
  • 41
  • I think it is the best answer because it allows to handle described error for all types of InputStreams, not only files. – sinedsem Dec 25 '15 at 12:00
6

I was getting the xml as a String and using xml.getBytes() and getting this error. Changing to xml.getBytes(Charset.forName("UTF-8")) worked for me.

John
  • 3,458
  • 4
  • 33
  • 54
  • 1
    This worked for me. Everyone else was "wrong" in terms of my problem. I was doing the same thing as you. Reading a file as string, getting the bytes as non-UTF8 and getting the SAX error. That `getBytes("UTF-8")` worked. – Magic Octopus Urn May 02 '18 at 17:03
2

I had the same problem in my JSF application which was having a comment line containing some special characters in the XMHTL page. When I compared the previous version in my eclipse it had a comment,

//Some �  special characters found

Removed those characters and the page loaded fine. Mostly it is related to XML files, so please compare it with the working version.

Lucky
  • 16,787
  • 19
  • 117
  • 151
1

I had this problem, but the file was in UTF-8, it was just that somehow on character had come in that was not encoded in UTF-8. To solve the problem I did what is stated in this thread, i.e. I validated the file: How to check whether a file is valid UTF-8?

Basically you run the command:

$ iconv -f UTF-8 your_file -o /dev/null

And if there is something that is not encoded in UTF-8 it will give you the line and row numbers so that you can find it.

Community
  • 1
  • 1
Robert Sjödahl
  • 734
  • 5
  • 19
1

I happened to run into this problem because of an Ant build.

That Ant build took files and applied filterchain expandproperties to it. During this file filtering, my Windows machine's implicit default non-UTF-8 character encoding was used to generate the filtered files - therefore characters outside of its character set could not be mapped correctly.

One solution was to provide Ant with an explicit environment variable for UTF-8. In Cygwin, before launching Ant: export ANT_OPTS="-Dfile.encoding=UTF-8".

Abdull
  • 26,371
  • 26
  • 130
  • 172
1
This error comes when you are trying to load jasper report file with the extension .jasper
For Example 
c://reports//EmployeeReport.jasper"

While you should load jasper report file with the extension .jrxml
For Example 
c://reports//EmployeeReport.jrxml"
[See Problem Screenshot ][1] [1]: https://i.stack.imgur.com/D5SzR.png
[See Solution Screenshot][2] [2]: https://i.stack.imgur.com/VeQb9.png

  
  
1

I had a similar problem. I had saved some xml in a file and when reading it into a DOM document, it failed due to special character. Then I used the following code to fix it:

String enco = new String(Files.readAllBytes(Paths.get(listPayloadPath+"/Payload.xml")), StandardCharsets.UTF_8);

Document doc = builder.parse(new ByteArrayInputStream(enco.getBytes(StandardCharsets.UTF_8)));

Let me know if it works for you.

fcdt
  • 2,371
  • 5
  • 14
  • 26
0

I have met the same problem and after long investigation of my XML file I found the problem: there was few unescaped characters like « ».

0

Those like me who understand character encoding principles, also read Joel's article which is funny as it contains wrong characters anyway and still can't figure out what the heck (spoiler alert, I'm Mac user) then your solution can be as simple as removing your local repo and clone it again.

My code base did not change since the last time it was running OK so it made no sense to have UTF errors given the fact that our build system never complained about it....till I remembered that I accidentally unplugged my computer few days ago with IntelliJ Idea and the whole thing running (Java/Tomcat/Hibernate)

My Mac did a brilliant job as pretending nothing happened and I carried on business as usual but the underlying file system was left corrupted somehow. Wasted the whole day trying to figure this one out. I hope it helps somebody.

felipe
  • 1,039
  • 1
  • 13
  • 27
0

I had the same issue. My problem was it was missing “-Dfile.encoding=UTF8” argument under the JAVA_OPTION in statWeblogic.cmd file in WebLogic server.

chk.buddi
  • 554
  • 1
  • 8
  • 29
0

You have a library that needs to be erased Like the following library

   implementation 'org.apache.maven.plugins:maven-surefire-plugin:2.4.3'
younes
  • 742
  • 7
  • 8
0

This error surprised me in production...

The error is because the char encoding is wrong, so the best solution is implement a way to auto detect the input charset.

This is one way to do it:

...    
import org.xml.sax.InputSource;
...

InputSource inputSource = new InputSource(inputStream);
someReader(
    inputSource.getByteStream(), inputSource.getEncoding()
  );

Input sample:

<?xml version="1.0" encoding="utf-16"?>
<rss xmlns:dc="https://purl.org/dc/elements/1.1/" version="2.0">
<channel>
...
Daniel De León
  • 13,196
  • 5
  • 87
  • 72