1

I am trying to parse a UTF-16LE XML string that is embedded within a file. I am able to read the actual string into a String object and I can view the XML in the watch window and it looks fine. The problem is that when I try and parse it, an exception keeps getting thrown. I have tried to specify UTF-16 and UTF-16LE in the getBytes line and in the InputStreamReader constructor but it still throws the exception.

DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;

builder = builderFactory.newDocumentBuilder();      
Document document = null;
byte[] bytes = xmlString.getBytes();
ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes);
InputSource is = new InputSource(new InputStreamReader(inputStream));
document = builder.parse(is); // throws SAXParseException

Edit: This is using Android. Also, here is the exception I get at the top of the STACK TRACE:

12-18 13:51:12.978: W/System.err(5784): org.xml.sax.SAXParseException: name expected (position:START_TAG @1:2 in java.io.InputStreamReader@4118c880) 12-18 13:51:12.978: W/System.err(5784): at org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:146) 12-18 13:51:12.978: W/System.err(5784): at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:107)

rplankenhorn
  • 2,075
  • 2
  • 22
  • 32
  • What is wrmHeaderXml? A string, an object or waht? It seems that you are converting from bytes to chars and then from chars to bytes again. Why? If you already got the bytes, just fed it to the InputSource(InputStream) – leonbloy Dec 17 '12 at 18:59
  • I guess it's a string. If you have a String object (and you state you can view it in the console) than the internal encoding doesn't matter, because it's a Java String – Raffaele Dec 17 '12 at 19:18

2 Answers2

2

Here is what I ended up doing:

DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;

builder = builderFactory.newDocumentBuilder();      
Document document = null;
byte[] bytes = Charset.forName("UTF-16LE").encode(xmlString).array();
InputStream inputStream = new ByteArrayInputStream(bytes);
document = builder.parse(inputStream);

Source: How does one create an InputStream from a String?

Community
  • 1
  • 1
rplankenhorn
  • 2,075
  • 2
  • 22
  • 32
  • What's the purpose of encoding a String? – Raffaele Dec 17 '12 at 19:20
  • It I just called xmlString.getBytes and passed it into the ByteArrayInputStream, then it would throw the SAXParseException. – rplankenhorn Dec 17 '12 at 22:19
  • But why do you need to extract the bytes from the string at all? Just pass a [`StringReader`](http://docs.oracle.com/javase/6/docs/api/java/io/StringReader.html) to the `InputSource` ctor – Raffaele Dec 17 '12 at 22:34
  • I tried just passing in a StringReader and it still throws the exception. I think it has to do with encoding. – rplankenhorn Dec 18 '12 at 15:42
  • A Java string **doesn't have** any associated encoding. It's a String. Internally it is stored with UTF-16LE, but this doesn't matter to the StringReader implementation – Raffaele Dec 18 '12 at 15:46
1

There's no need to convert back and forth between strings and byte in the same program. It's just as easy as:

String xml = "<root><tag>Hello World!</tag></root>";

Document dom = DocumentBuilderFactory.newInstance()
    .newDocumentBuilder().parse(new InputSource(new StringReader(xml)));
Raffaele
  • 20,627
  • 6
  • 47
  • 86
  • This throws a SAXParseException on the parse line. – rplankenhorn Dec 18 '12 at 15:41
  • No need to be rude. When I try and use the parse line above with the XML that I am parsing it throws a SAXParseException. I posted the top of the STACK TRACE above. If I just call the xmlString.getBytes() and look at the binary data, it is UTF-16LE encoding. The first two bytes are 0xFF 0xFE which tells me it is little endian UTF-16 encoding. – rplankenhorn Dec 18 '12 at 19:55
  • @rplankenhorn it sounds like your `xmlString` actually contains the BOM as its first character. If you stripped this first character off the string and then created a StringReader from the result then it should parse fine from that without the back-and-forth to bytes. – Ian Roberts Dec 18 '12 at 20:08
  • Every other character is also 0x00 so I am not sure that will work. – rplankenhorn Dec 18 '12 at 21:48
  • @rplankenhorn I needed to be rude, because yours is not the the attitude of a SO's user - you didn't try to understand the problem, you didn't understand my answer, and you even downvoted it, which is unfair, because you didn't understand it and throws an Exception on your Android system (BTW, that stack trace is **useless**, so I was right in the assumption that you don't know how to read a stack trace - you need to paste the root causes, too). Based on your comments, **the input string is the problem**, so please tell where do you get it from or how do you build it – Raffaele Dec 18 '12 at 21:55
  • I understand your answer and I understand the problem. Your answer just didn't work and still threw an Exception so I didn't mark it as an answer. I apologize for down voting it. I will retract that. As for the XML, I can't post it because it is proprietary and for a work client. I do know how to read a stack trace but I didn't want to post my full one because the classes are proprietary. Since I figured out my problem, I think the point is moot. – rplankenhorn Dec 18 '12 at 23:30