0

I have an XML string and I have to parse that XML in C# (using xml.LoadXml()). But in the XML there are some special characters like <, >, &, ", etc. I have written the replace of these special characters with its escape characters.

But the problem is that < and > is replaced by escape characters for the XML tag as well. How can I resolve this?

I want to replace only extra special characters and not XML tag values.

XML:

<?xml version="1.0" encoding="utf-8"?>
<Objects>
    <Object>
        <t2>test<
        </t2>
        <t3>test</t3>
        <s4>76</s4>
        <s7>321</s7>
        <t4>test</t4>
        <t6>test&</t6>
        <t8>NY</t8>
    </Object>
</Objects>
halfer
  • 19,824
  • 17
  • 99
  • 186
Dreamer
  • 586
  • 3
  • 7
  • 23
  • Huh? Are you saying your XML is malformed and you're trying to correct it? Some anonymised XML as a sample might be useful to illustrate your issue. – ProgrammingLlama Nov 06 '19 at 07:26
  • Sample XML: "test<test76321testtest&NY" Here in above xml t2 tag is having extra '<' character and t6 is having &. And these characters are not allowing me to parse the xml. – Dreamer Nov 06 '19 at 07:32
  • You should [edit your question](https://stackoverflow.com/posts/58724974/edit) – ProgrammingLlama Nov 06 '19 at 07:36
  • 1
    Where are you getting this XML from? `test<`, for example, should already be escaped as `test<` in valid XML. – ProgrammingLlama Nov 06 '19 at 07:38
  • It is coming from UI itself, we are taking these values from UI and warping it into xml tags. – Dreamer Nov 06 '19 at 07:46
  • 1
    Wait, you're generating this XML manually? In that case, this is an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Your problem isn't fixing the XML, it's how to not generate it like this in the first place. The simplest option would be to [serialize a C# object](https://stackoverflow.com/questions/4123590/serialize-an-object-to-xml) into XML. – ProgrammingLlama Nov 06 '19 at 07:46
  • ok, thank you! Is there any way to deal with such wrongly formed xml strings? Is there any way to replace only these characters and not the xml tag character? As of now I can not go and change the existing logic so I need tackle it. – Dreamer Nov 06 '19 at 09:07
  • 1
    I don't really have a good solution. If the user writes `` and then that becomes `` in your XML, is `` part of the valid XML, or should it be escaped? How can you tell? – ProgrammingLlama Nov 07 '19 at 00:39
  • You need to escape the values before you "wrap" them in xml tags as you say. But really should use a tool ([XmlWriter](https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmlwriter)) to create the xml in the first place rather than concatenating strings, and this problem will go away, – Magnus Apr 07 '20 at 11:44

1 Answers1

0

Try to use '''XPathNavigator''' to extract it. If the special character is an inner value, extract if with ".../@youValue". Follow my example here: https://stackoverflow.com/a/58715025/8916824

LiadPas
  • 16
  • Does this fix invalid XML? – ProgrammingLlama Nov 06 '19 at 07:41
  • Yes, with '''LoadXml''' you treat it like a string. I can write you an example if you like. – LiadPas Nov 06 '19 at 07:56
  • [Example using OP's XML](https://rextester.com/YVD90796) and your proposed use of `XmlDocument`. – ProgrammingLlama Nov 06 '19 at 08:04
  • As an xml tag? Value? How exactly does the Xml string look like? – LiadPas Nov 06 '19 at 08:04
  • oh thank you, i understand the problem with the first solution. then i guess you must use https://www.advancedinstaller.com/user-guide/xml-escaped-chars.html like that: string xml = "test<test76321test@lt;test&NY"; var doc = new XmlDocument(); doc.LoadXml(xml); var xmlNav= doc.CreateNavigator(); string val= xmlNav.SelectSingleNode("/Objects/Object/t2").Value; – LiadPas Nov 06 '19 at 08:56
  • and now you've reached OP's question. OP has XML that hasn't escaped this data, so it's invalid. Now how can OP fix that? That's OP's question. – ProgrammingLlama Nov 07 '19 at 00:42