3

I am trying to use Regex in C# to match a section in an xml document and wrap that section inside of a tag.

For example, I have this section:

<intro>
    <p>this is the first section of content</p>
    <p> this is another</p>
</intro>

and I want it to look like this:

<intro>
   <bodyText>
      <p> this is asdf</p>
      <p> yada yada </p>
   </bodyText>
</intro>

any thoughts?

I was considering doing it using the XPath class in C# or just by reading in the document and using Regex. I just can't seem to figure it out either way.

here is the one try:

        StreamReader reader = new StreamReader(filePath);
        string content = reader.ReadToEnd();
        reader.Close();

        /* The regex stuff would go here */

        StreamWriter writer = new StreamWriter(filePath);
        writer.Write(content);
        writer.Close();
    }

Thanks!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
samandmoore
  • 1,221
  • 2
  • 15
  • 23
  • 5
    Obligatory link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Carl Norum Jun 04 '10 at 20:55
  • Seems like you might want XSLT. – jball Jun 04 '10 at 20:56
  • What version of .NET are you using? – Mark Byers Jun 04 '10 at 23:53
  • I'm using 3.5 and I am using XSLT for some things, I might use it for this, I was just hoping to find a quick C# solution so that I can spend more time writing presentation code in XSLT rather than organizational code. – samandmoore Jun 07 '10 at 20:10

2 Answers2

6

I wouldn't recommend regular expressions for this task. Instead you can do it using LINQ to XML. For example, here is how you could wrap some tags inside a new tag:

XDocument doc = XDocument.Load("input.xml");
var section = doc.Root.Elements("p");
doc.Root.ReplaceAll(new XElement("bodyText", section));
Console.WriteLine(doc.ToString()); 

Result:

<intro>
  <bodyText>
    <p>this is the first section of content</p>
    <p> this is another</p>
  </bodyText>
</intro>

I assume that your actual document differs considerably from the example you posted so the code will need some adjustment to fit your requirements, but if you read the documentation for XDocument you should be able to do what you want.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Though I agree with this approach, I don't think that code actually does what OP wanted. – hemp Jun 04 '10 at 21:02
  • 1
    @hemp: Yes, I'm not claiming that he can blindly copy & paste this code into his project and all his problems will be solved, but hopefully it's enough of a hint to get started. – Mark Byers Jun 04 '10 at 21:05
  • I read through it again and tried it myself, I was wrong - your code does exactly what he asked. Sorry! – hemp Jun 04 '10 at 23:43
1

I would suggest the use of System.XML and XPath - I don't think XML is considered a regular language similar to HTML which causes issues when trying to parse it with Regular expressions.

Use something like

XMLDocument doc = new XMLDocument();
doc.Load("Path to your xml document");

Enjoy!

Doug
  • 5,268
  • 24
  • 31