5

I am editing a series of XML files, and I need to remove all attributes with the name "foo". This attribute appears in more than one type of element. An example snippet from the XML might be:

<bodymatter id="######">
  <level1 id="######">
    <pagenum page="#####" id="######" foo="######" />
    <h1 id="#####" foo="#####">Header</h1>
    <imggroup id="#######">
               .
               .
              etc.

The best solution I have uses Regex:

Regex regex = new Regex("foo=\"" + ".*?" + "\"", RegexOptions.Singleline);
content = regex.Replace(content, "");

I know built-in XML parsers could help, but ideally I want to make simple XML replacements/removals without having to deal with the baggage of an entire XML parser. Is Regex the best solution in this case?

Edit:

After some research in the XmlDocument class, here is one possible solution I came up with (to remove more than one attribute type stored in the array "ids"):

private void removeAttributesbyName(string[] ids)
{
    XmlDocument doc = new XmlDocument();
    doc.Load(path);
    XmlNodeList xnlNodes = doc.GetElementsByTagName("*");
    foreach (XmlElement el in xnlNodes)
    {
        for (int i = 0; i <= ids.Length - 1; i++)
        {
            if (el.HasAttribute(ids[i]))
            {
                el.RemoveAttribute(ids[i]);
            }
            if (el.HasChildNodes)
            {
                foreach (XmlNode child in el.ChildNodes)
                {
                    if (child is XmlElement && (child as XmlElement).HasAttribute(ids[i]))
                    {
                        (child as XmlElement).RemoveAttribute(ids[i]);
                    }
                }
            }
        }
    }
}

I don't know if this is as efficient as it possibly could be, but I've tested it and it seems to work fine.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
CW_20161
  • 53
  • 1
  • 5

3 Answers3

9

Do not use regex for XML manipulation. You can use Linq to XML:

XDocument xdoc = XDocument.Parse(xml);
foreach (var node in xdoc.Descendants().Where(e => e.Attribute("foo")!=null))
{
    node.Attribute("foo").Remove();
}

string result = xdoc.ToString();
fcuesta
  • 4,429
  • 1
  • 18
  • 13
  • I figured Regex was not a good idea, but I was reluctant to go digging through all the method libraries for built-in XML parsers. Now I am looking into the XmlDocument class, however, and I may also make use of what you have here. Thanks! – CW_20161 Jul 26 '13 at 23:33
  • Just to help out anyone like me who is a slow learner; I was getting error CS1061, saying I was missing a reference. To be sure, you need using refs for "System.Linq.Xml" and "System.Linq" for the above code to work. – Steve Hibbert Jul 15 '19 at 14:41
2

Is Regex the best solution in this case?

No.

You'll want to use something that works on XML at the object level (as an XmlElement, for example) and not at the string level.

Andrew Coonce
  • 1,557
  • 11
  • 19
0

I use the following to remove namespaces. This might also work in removing attributes from other nodes as well.

       FileStream fs = new FileStream(filePath, FileMode.Open);

       StreamReader sr = new StreamReader(fs);

        DataSet ds = new DataSet();
        ds.ReadXml(sr);
        ds.Namespace = "";

        string outXML = ds.GetXml();
        ds.Dispose();
        sr.Dispose();
        fs.Dispose();