The classes contained in the System.Xml.Linq Namespace escape/unescape automatically for you.
string xml = """
<fragment>
<set-header name="MyHeader">
<value>Sample text < </value>
</set-header>
</fragment>
<fragment>
<set-header name="MyHeader">
<value>Sample text > </value>
</set-header>
</fragment>
""";
var doc = XDocument.Parse($"<root>{xml}</root>");
foreach (XElement element in doc.Descendants("value")) {
Console.WriteLine(element.Value);
}
Prints:
Sample text <
Sample text >
Note that you must embed these fragments into a single root element, otherwise you will get an exception telling you that you have more than one root element.
The other way round works as well:
var doc = new XDocument(new XElement("test", "value = <x> "));
string xml = doc.ToString();
Console.WriteLine(xml);
Prints:
<test>value = <x> </test>
Attempt to fix the bad xml:
// Escape the extra <
string xml = Regex.Replace(malformedXml, @"<([^\w/])", @"<$1");
// Escape the extra >
xml = Regex.Replace(xml, @"(\W)>", @"$1>");
This works only if the <
can be identified as not being part of an open or closing tag. This regex searches for a <
not followed by either a word character or a /
and replaces it by <
and the following character (the group number 1 denoted as $1
).
The second Replace
replaces >
not preceeded by a word character or a double quote.
Test:
string malformedXml = """
<fragment>
<set-header name="MyHeader">
<value>Sample text < > </value>
</set-header>
</fragment>
""";
string xml = Regex.Replace(malformedXml, @"<([^\w/])", @"<$1");
xml = Regex.Replace(xml, @"([^\w""])>", @"$1>");
Console.WriteLine(xml);
Prints:
<fragment>
<set-header name="MyHeader">
<value>Sample text < > </value>
</set-header>
</fragment>
This works with this example of a malformed XML, but since we don't know all possible ways the XML could be malformed, we have no guarantee that this will always work.
The only good solution is to fix the problem at the source, i.e., by the provider of this XML.