0

Getting exception while parsing the XML if it contains '&' and '<' characters. I have read somewhere that having these characters in XML means that XML is not valid, but I'm receiving it from third party where I can't reformat it.

Below is my code of XML parsing using XDocument:

string data = profile.Content.ReadAsStringAsync().Result; //Read input
XDocument doc = new XDocument();
if (data != "")
   {
       string rawHtml = WebUtility.HtmlDecode(data);
       doc = XDocument.Parse(rawHtml); //Parse input into XDocument
   }

Here, data contains actual XML input and not XML filepath. Please suggest me how to handle these special characters.

Rutuja
  • 7
  • 3
  • Can you provide the example of the XML? Also, that is not a valid XML, whoever sends the XML in that format might be sending incorrect data. – Matt Sep 05 '19 at 09:55
  • 1
    This is going to be next to impossible to deal with, unless you know that you can safely replace ALL instances of `&`, `<` etc with the correct escape sequences. Can you inform the third party that their data is bad? – Matthew Watson Sep 05 '19 at 09:55
  • You are using the correct code. I would like to see sample of the xml before and after the HtmlDecode method. It appears the HtmlDecode is not work properly. Suspect something with the encoding is wrong. – jdweng Sep 05 '19 at 09:55
  • Some part of the Input: ```P&G Road``` After WebUtility.HtmlDecode(data), looks like it remains as it is. – Rutuja Sep 05 '19 at 10:12
  • @Matthew: Can you inform the third party that their data is bad? >> No, they are sending XML data to us for formatting purpose where we will correct the data. – Rutuja Sep 05 '19 at 10:15

1 Answers1

1

This data is not XML.

Check what you agreed with the third party.

If the contract was to exchange data in XML, then they are failing to satisfy the contract and you should deal with it the way you would deal with any other faulty goods from a supplier: return it and ask for your money back.

If the agreement didn't specify that they would send you XML, then you shouldn't be trying to parse it with an XML parser.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • What do you mean by this data is not valid XML? P&G Road is non-valid and PG Road is valid one? Is it? – Rutuja Sep 05 '19 at 12:49
  • To be valid (or technically, to be well-formed) the `&` must be written as `&`. – Michael Kay Sep 05 '19 at 13:36
  • `What do you mean by this data is not valid XML? ` We mean: It is not valid XML... How else can we put it? It contains an unescaped ampersand character which should be `&`. Likewise, a `<` should be escaped as `<` – Matthew Watson Sep 05 '19 at 14:20