3

I am trying to load something which claims to be an XML document into any type of .net XML object: XElement, XmlDocument, or XmlTextReader. All of them throw an exception :

Name cannot begin with the '0' character, hexadecimal value 0x30

The error related to a bit of 'XML'

<chart_value 
    color="ff4400" 
    alpha="100" 
    size="12" 
    position="cursor" 
    decimal_char="." 
    0="" 
/>

I believe the problem is the author should not have named an attribute as 0.

If I could change this I would, but I do not have control of this feed. I suppose those who use it are using more permissive tools. Is there anyway I can load this as XML without throwing an error?

There is no XML declaration either, nor namespace or contract definition. I was thinking I might have to turn it into a string and do a replace, but this is not very elegant. Was wondering if there was any other options.

svick
  • 236,525
  • 50
  • 385
  • 514
matthewbaskey
  • 1,027
  • 2
  • 17
  • 40
  • 3
    An XML parser will always choke on invalid XML documents. This was probably hand crafted, or built using string concatenation, not by an XML tool. – Oded Oct 06 '11 at 13:04
  • 1
    This is, indeed, a totally invalid piece of XML - so you will need to do the search-and-replace. – Jeremy McGee Oct 06 '11 at 13:05
  • 1
    Do excuse my pedantic edit. Wanted to make clear that the root of your problem is that *this isn't XML*. – AakashM Oct 06 '11 at 13:40
  • What is the best way to load bad XML, push it into a string and do a Replace on it? If I load an XMLReader and then do a reader.ReadOuterXml() it still throw the error – matthewbaskey Oct 06 '11 at 13:54
  • Because it still is not valid XML. XML is VERY strict, and some programmers are very stupid - had to deal with such an aberration myself. This is NOT XML, so an XML parser wont work. And XML is VERY strict on validation basic structure. – TomTom Oct 06 '11 at 19:03
  • The best way to load bad XML is to not load it at all. Tell the sender to send you XML. – John Saunders Oct 06 '11 at 19:17

4 Answers4

3

Just replace the Numeric value with '_' Example: "0=" replace to "_0=" I hope that will fix the problem, thanks.

Tareq
  • 31
  • 3
3

As many have said, this is not XML.

Having said that, it's almost XML and WANTS to be XML, so I don't think you should use a regex to screw around inside of it (here's why).

Wherever you're getting the stream, dump into into a string, change 0= to something like zero= and try parsing it.

Don't forget to reverse the operation if you have to return-to-sender.


If you're reading from a file, you can do something like this:

        var txt = File.ReadAllText(@"\path\to\wannabe.xml");
        var clean = txt.Replace("0=", "zero=");
        var doc = new XmlDocument();
        doc.LoadXml(clean);

This is not guaranteed to remove all potential XML problems -- but it should remove the one you have.

Community
  • 1
  • 1
Michael Paulukonis
  • 9,020
  • 5
  • 48
  • 68
  • What is the best way to load bad XML, push it into a string and do a Replace on it? If I load an XMLReader and then do a reader.ReadOuterXml() it still throw the error – matthewbaskey Oct 06 '11 at 13:57
  • How is your data arriving originally? From a file, or a webservice? If the latter, it's probably a string already. If a file, use the StreamReader, and ReadToEnd() which will provide you with a string. – Michael Paulukonis Oct 06 '11 at 14:13
  • 1
    Instead of calling `Close()` and `Dispose()` (which is redundant), it's better to use `using`. Or just use `File.ReadAllText()` instead. – svick Oct 06 '11 at 14:51
  • Sorry, I got bitten by some memory leak trouble, and started closing and disposing whenever both methods exist. Which is voodoo coding, and I should know better. Cleaned up the sample per your suggestions. – Michael Paulukonis Oct 06 '11 at 18:59
  • thing is I have to pull the XML from an HTTP request...is there a way to pull an HTTP request into a File object? – matthewbaskey Oct 06 '11 at 20:50
  • If you're pulling it from an HTTP request, isn't it coming in as a string in the first place? If so, no need to dump into a file. – Michael Paulukonis Oct 07 '11 at 13:52
2

It might claim to be an XML document, but the claim is clearly false, so you should reject the document.

The only good way to deal with bad XML is to find out what bit of software is producing it, and either fix it or throw it away. All the benefits of XML go out of the window if people start tolerating stuff that's nearly XML but not quite.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • As programmers, though, we don't always have the option of telling our boss to tell our vendors to take a hike. I once had a the vendor of a vendor of a client demand that the XML we sent them have CRLF endings instead of CR (or something like that). Something that shouldn't matter at all for XML. But there was no way of making them change, so I had to git-r-done. And then there was the co-worker who complained at high-volume when the ORDER of xml-elements changed. He had to re-write his code! Poor thing..... – Michael Paulukonis Oct 20 '11 at 20:30
  • 1
    No, but you can make it clear to your boss that you are no longer exchanging data with the vendor in XML, but rather in a proprietary format, and that this will increase costs and risks. Translate it into business terms that a boss can understand, and you will get a hearing. – Michael Kay Oct 24 '11 at 23:26
0

The 0="" obviously uses an invalid attribute name 0. You'd probably have to do a find/replace to try and fix the XML if you cannot fix it at the source that created it. You might be able to use RegEx to try to do more efficient manipulation of the XML string.

Chris Snowden
  • 4,982
  • 1
  • 25
  • 34