1

I have gone through a lot of answers for this but was not able to solve issue so asking.

I am getting my xml in a string. It consist of "< 6" as content in some node values.

As a result I am getting an exception

Name cannot begin with the ' ' character, hexadecimal value 0x20. Line 3270, position 54.

Here is the code:

string patternToReplaceAnd = "&(?![a-z#]+;)";
Regex reg = new Regex(patternToReplaceAnd);
xml = reg.Replace(xml, "&amp;");
XDocument xDoc = XDocument.Parse(xml);

Can anyone help me out?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Ani
  • 63
  • 2
  • 11
  • 3
    Fix your XML before processing it. – CodeCaster May 23 '17 at 07:38
  • 2
    Where is the XML, and why is it broken? Let the supplier of the XML fix the problem. Who knows what other issues arise... – Patrick Hofman May 23 '17 at 07:39
  • 3
    some name in `xml` starts with space. It's on line 3270, position 54. :) – Nino May 23 '17 at 07:39
  • 1
    You can't reliably fix unescaped characters like this because you don't know exactly which instances of `<` need escaping, barring some complicated parsing. On the supplier side, the easiest fix is to either use libraries written to produce valid XML, or put all values in [CDATA sections](https://stackoverflow.com/questions/2784183/what-does-cdata-in-xml-mean) so you don't have to worry about escaping. – Jeroen Mostert May 23 '17 at 07:43
  • I am cleaning string I get as much as possible as I get it from number of resources. The thing I am stuck at is I have a node fully furnished | Short-Stay (< 6 mo.) possible and I get exception there. – Ani May 23 '17 at 07:45
  • *but* if you only problem is like `"< "` (lesser plus space), then try `xml = xml.Replace("< ", "& ")`. It would be morally wrong (and it wouldn't guarantee that there are no other problems), but it is "free to try" :-) – xanatos May 23 '17 at 07:46
  • 2
    Ah wait... if it is user-inserted text then there is no hope :-) Next time they'll write `<6` months without the space :-) – xanatos May 23 '17 at 07:46
  • @xanatos Yes. I tried it by replacing `<` with `&` but as XML is from client side this is not a permanent solution – Ani May 23 '17 at 07:52
  • @Ani While it is probably possible to solve the `<` problem in 20 lines (in the end you know that your xml has a fixed list of valid elements `` would be very difficult (you would really need to tokenize the xml)... – xanatos May 23 '17 at 08:19
  • Anyone who feels malicious (or just another inept automated system) could use `` in a description, thereby guaranteeing your XML will be malformed in a way you can't fix. Fix the source of your XML if possible. If it's not possible, consider if you're getting paid enough to do this. – Jeroen Mostert May 23 '17 at 09:19
  • Uh, you shouldn't replace `<` by `&`. The side _supplying_ the xml to you should replace all `<` and `>` that's content-text inside tags by `<` and `>` – Nyerguds May 23 '17 at 09:26

2 Answers2

1

You say you're getting your XML in a string. You're not. You're getting garbage in a string.

If the garbage is really important to you then you can try and convert it to XML. How you do that depends on just how bad it is, which we can't really judge.

Much better: refuse to accept shoddy goods. Go back to the supplier and tell them to generate real XML.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
1

I do realize that this question is old but I came across the same problem today and I hope my answer will help someone who may land on this question in the future.

The problem is the content that includes < followed by the space. You will have to replace that content with &lt; so that It is not recognised as a malformed xml start tag.

xml = xml.Replace('< ',"&lt; "); //make sure you include the space after < to avoid replacing actual tags.
XDocument xDoc = XDocument.Parse(xml);
RealSollyM
  • 1,530
  • 1
  • 22
  • 35