1

I have an XML file that users can change and add some different text to certain attributes and then upload to my tool. The problem is that they sometimes include < and > in the values of the attributes. I want to change that to &lt; and &gt;.

For instance:

 <title value="Tuition and fees paid with (Percent<5000) by Gender" />

Loading this causes an error using the following code:

XmlDocument smldoc = new XmlDocument();
xmldoc.LoadXml(xmlString);

The issue I have is that I need all the attributes which can be user generated to be in an html entity for < and >. The problem is that I cannnot do just a .Replace("<", "&lt;") because the actual XML file needs those.

How is this done easily? The code is C#.Net.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
cdub
  • 24,555
  • 57
  • 174
  • 303
  • That isn't valid XML, you can potentially validate the user input and let the user know it isnt valid, but there is nothing a parser can do. – Glenn Ferrie Aug 30 '19 at 20:06
  • 1
    You shouldn't allow users to edit XML (unless they're also developers). Provide a tool for editing the content so they don't touch the XML structure. `.xlsx` documents are zipped XML files but Microsoft doesn't expect users to edit the XML directly to change cell contents. – madreflection Aug 30 '19 at 20:11
  • The https://stackoverflow.com/questions/44765194/how-to-parse-invalid-bad-not-well-formed-xml] duplicate covers "read" portion of your question (originally it's Java, but strategies covered are universal, also includes some .NET specific too). As @madreflection pointed out letting users edit XML is generally bad idea (I've known maybe couple people who can encode correctly by hand every the time... but anyone else who were not directly or indirectly involved in XML W3C committee make mistakes often enough). If you have to have plain text editing use separate files... – Alexei Levenkov Aug 30 '19 at 20:51

1 Answers1

1

Why are you allowing your users to send you invalid XML in the first place? You should deny such input. Isn't there a more suitable format for your users to send this data? Like a list of "key: value" strings?

Anyway you can fix this by your replace method, just make sure you start after the first and stop before the last < and >.

Something like this:

var trimmedXml = xmlString.Trim(); // to remove whitespace at either end

var innerText = trimmedXml.Substring(1, trimmedXml.Length -1);
innerText = innerText.Replace("<", "&lt").Replace(">", "&gt;");

xmlString = trimmedXml[0] + innerText + trimmedXml[trimmedXml.Length -1];

Of course you'll need to validate that the "XML" string at least contains </>.

CodeCaster
  • 147,647
  • 23
  • 218
  • 272