3

I use "application 1" to create and edit xhtml files. It has an option to enter annotations into the content of non-empty elements like p, h1, h2, td etc ... which results in mixed xml code sections like this:

<p>Hello <NS1:annotation [...SomeAttributes...]>everybody</NS1:annotation> out there!</p>

For translational purposes I have to export these xhtml files into "application 2" which can't deal with these internal elements. As the annotations are not part of the desired content in the translations removing them before exporting them to application 2 would be a perfect workaround:

<p>Hello everybody out there!</p>

Removing nodes from an XmlDocument reliably finds and removes the internal xml elements but it also deletes the content of the annotation element - loosing the word "everybody" in the example above:

<p>Hello out there!</p>

What I need is rather "unbinding" the content of these internal elemts into the content of the parent element. But so far I haven't found a method using the c# xml tools doing the job.

So far I first save the xhtml file, re-open it as text file and use regedits to remove the annotation. I can even use c# methods for it:

TextFile txt = new TextFile();
string s = txt.ReadFile(filename);

string pattern = @"<NS1:annotation.+>(.+)</NS1:annotation>";
string input = s;
string replacement = "$1";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);

TextFile.Write((filename,result););

This is doubtlessly a better solution as it doesn't loose the content of the annotation but I wonder if there is really not a solution based on the c# Xml-tools that does the job.

Anybody out there who knows it?

oelgoetz
  • 51
  • 3

1 Answers1

2

I think I found an answer using XmlDocument. The key is that in mixed xml nodes the text surrounding the node can be adressed as xml nodes too. I wasn't aware of this ...

The following function unbinds the content of the mixed node and releases it into the content of the parent node. I haven't tested it for nodes containing multiple annotations, but that's enough for me at the moment ...

private void removeAnnotations(XmlDocument doc)
{
    XmlNamespaceManager manager = new XmlNamespaceManager(new NameTable());
    manager.AddNamespace("NS1","http://www.someurl.net");
    XmlNodeList annotations = doc.SelectNodes("//NS1:annotation", manager);

    int i = 0;
    while (i < annotations.Count) 
    {
      //in mixed xml the Siblings are xml text nodes. Therefore we write them into buffers:        
      string s0 = "";
      if(annotations[i].PreviousSibling != null) s0 = annotations[i].PreviousSibling.InnerText;        
      string s2 = "";
      if(annotations[i].NextSibling != null) s2 = annotations[i].NextSibling.InnerText;
      //buffer the content of the annotation itself
      string s1 = annotations[i].InnerText;       
      //buffer the link to the parent node before we remove the annotation,
      XmlNode parent = annotations[i].ParentNode;
      //now remove the annotation
      parent.RemoveChild(annotations[i]);
      //and apply the new Text to the parent element
      parent.InnerText = s0 + s1 + s2;
      i++;
    }
}
oelgoetz
  • 51
  • 3