I use "application 1" to create and edit xhtml files. It has an option to enter annotations into the content of non-empty elements like p, h1, h2, td etc ... which results in mixed xml code sections like this:
<p>Hello <NS1:annotation [...SomeAttributes...]>everybody</NS1:annotation> out there!</p>
For translational purposes I have to export these xhtml files into "application 2" which can't deal with these internal elements. As the annotations are not part of the desired content in the translations removing them before exporting them to application 2 would be a perfect workaround:
<p>Hello everybody out there!</p>
Removing nodes from an XmlDocument reliably finds and removes the internal xml elements but it also deletes the content of the annotation element - loosing the word "everybody" in the example above:
<p>Hello out there!</p>
What I need is rather "unbinding" the content of these internal elemts into the content of the parent element. But so far I haven't found a method using the c# xml tools doing the job.
So far I first save the xhtml file, re-open it as text file and use regedits to remove the annotation. I can even use c# methods for it:
TextFile txt = new TextFile();
string s = txt.ReadFile(filename);
string pattern = @"<NS1:annotation.+>(.+)</NS1:annotation>";
string input = s;
string replacement = "$1";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
TextFile.Write((filename,result););
This is doubtlessly a better solution as it doesn't loose the content of the annotation but I wonder if there is really not a solution based on the c# Xml-tools that does the job.
Anybody out there who knows it?
is html not xml.
– jdweng Jun 07 '18 at 23:38