-1

I have an xml file that looks like this:

...
<e1>
   <e2>
      <e3>content1.1</e3>
      <e3>content1.2</e3>
      ...
      <e3>content1.n</e3>
   </e2>
   <e2>
      <e3>content2.1</e3>
      <e3>content2.2</e3>
      ...
      <e3>content2.n</e3>
   </e2>
   ...
</e1>
...

I need a regex that given the xml string (a bunch of e1 elements), would match the all e2 elements that have a child e3 element with content contentx. In other words, the regex would match all e2 elements (could have different e1 parents) where each of these elements has at least one e3 child who's contents equal contentx.

Zsolt Botykai
  • 50,406
  • 14
  • 85
  • 110
  • 5
    You should use a XML-parsing library. Which language are you coding in? – Sufian Latif Jan 30 '12 at 19:03
  • Using regular expressions is not good approach for parsing XML. They have a lot of issues in this context. I highly recommend you to use XPath – Gaim Jan 30 '12 at 19:05
  • Using C#. The problem with using xml parsing is that the file I'm trying to parse can contain invalid xml. – Mayad AL-Saidi Jan 30 '12 at 21:27
  • If you need to match stuff that isn't XML, why did you tell us it was XML? Matching XML using regular expressions is pretty-well impossible, matching stuff that might or might not be XML is best not even to think about. – Michael Kay Jan 30 '12 at 23:50
  • I said it was XML to help you guys understand the format. I was looking for a regex so I didn't think that whether or not it was valid XML would matter. Sorry about the confusion. Thanks for your help. – Mayad AL-Saidi Jan 31 '12 at 00:27

1 Answers1

3

Don't use a regex to parse XML. Just don't do it. This is precisely the sort of thing that XPath was made to do. I would offer an XPath expression, but I'm not completely sure what you are trying to match.

D.Shawley
  • 58,213
  • 10
  • 98
  • 113
  • Probably something like `/e1/e2[e3='contentx']` – Daniel Haley Jan 30 '12 at 19:08
  • You're right, using XPath would probably be easier. The reason I'm trying to use a regex is that the string I'm dealing with here is actually html which might not always be valid xml. I'm coding in C# Any suggestions? – Mayad AL-Saidi Jan 30 '12 at 21:24
  • @MayadAL-Saidi - Take a look at this question: http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c – Daniel Haley Jan 30 '12 at 22:51