1

I have some XML returned as a string from a web service (unfortunately I have no control over how it is returned to me. It's usually valid XML, but on occasion I'll receive some that is slightly invalid, which leads to this issue).

The string basically reads like so:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<STATUS _Description="...will contact you with a ("Quote") when ..." />

When I try to do: XDocument.Parse(xmlString);

It throws the following error:

'Quote' is an unexpected token. Expecting white space. Line 15, position 113.

This is to be expected, but I can't figure out the correct string manipulation to fix it. I've tried a number of things including:

static string RemoveInvalidXmlChars(string xmlString)
{
  var validXmlChars = xmlString.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray();
  return new string(validXmlChars);
}

And: xmlString = xmlString.Replace("\"", "&quot;"); (as well as numerous other combinations like (Replace(@"""", ""), etc.)

Which throws the error:

"'&' is an unexpected token. The expected token is '\"' or '''. Line 1, position 15."}

And I've also tried xmlString = SecurityElement.Escape(xmlString); (it throws the same error as above). I've also tried using XmlWriter/Reader to modify the string, but the reader throws an error when it reaches the offending element.

My next guess was to use Regular Expressions to convert just the nested quotes to single quotes, but RegEx is kind of foreign to me. How can I fix this so that I can parse it using XDocument.Parse?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
vinco83
  • 467
  • 1
  • 4
  • 16
  • Generally not much can be done to fix invalid XML. (Attribute values can't contain unescaped quotes). Short of asking for valid XML you can try HtmlAgilityPack - since it is designed to read malformed HTML there is some chance that it will be able to recover some content of your "XML". – Alexei Levenkov Aug 23 '15 at 01:08

2 Answers2

2

That string you posted as an XML is from inspecting some variable in Visual Studio while debugging, right?

Well, Visual Studio auto escapes double quotes so you can just copy that value into c# code easily. In fact, your XML does not contain all those \" groups, but rather just ". So instead of \" it contains ". Your actual problem is here:

"Thank you for your order! The order is currently being reviewed by a moderator. A moderator will contact you with a ("Quote") when the review is complete."

The problem is with the "Quote" double quote string inside another double-quoted string. Hence, the Quote related error. The string ends where Quote begins. And it appears as an unexpected token. Your XML provider is actually not escaping the double quotes surrounding Quote word.

Mihai Caracostea
  • 8,336
  • 4
  • 27
  • 46
  • You're absolutely right @Mihai! I'm going to edit my question accordingly. But the question still stands, I believe, as to how to eliminate the quotes around "Quote" (unless I'm missing something in your reply?). Thanks for the reply to remind me of those escaped quotes when viewing in VS, however! – vinco83 Aug 22 '15 at 23:27
  • You should signal the web service provider that they provide malformed XML. It's their job to fix it. Otherwise, you can remove the quotes yourself, but that feels kinda wrong. – Mihai Caracostea Aug 22 '15 at 23:32
  • And XML also contains the schema, so doing some general rules for patching might not be safe. I suggest you take each case at a time and hardcore particular fixes. Like replacing "Quote" with Quote and so on. This way you risk less breaking some other XMLs that might fit the rules you create for patching. – Mihai Caracostea Aug 22 '15 at 23:36
  • @vinco83 Forgot to tag you in the comments above. – Mihai Caracostea Aug 22 '15 at 23:39
2

I have some XML returned as a string from a webservice (unfortunately I have no control over how it is returned to me. It's usually valid XML, but on occasion I'll receive some that is slightly invalid, which leads to this issue).

No, you do not have XML. What you have is text that appears to be intended to be XML, but has fallen short of meeting the rules for being well-formed (which, by the way, are different from the rules for being valid). It is not XML. No conformant XML processor can help you here.

The entirely proper way forward is to inform the owner of the web service that their service is broken. They have to escape quotes embedded within attributes, or use the opposite quote style (single vs double quote characters), or use elements for data containing quote characters. They cannot just dump anything into an attribute value and hope for the best.

You might be advised to attempt to repair the text into well-formed XML. Refuse, unless you enjoy playing Whac-A-Mole with the infinite ways the XML Recommendation can be ignored.

Community
  • 1
  • 1
kjhughes
  • 106,133
  • 27
  • 181
  • 240