0

I am working on a project that slots into another project. The project I am slotting into produces a weird XML syntax that cannot be changed.

It has a weird element, to illustrate

<DocumentRoot>
   <Parent>
      <Child-Which-Can-Occur-Random-Number-Of-Times> Data </Child-Which-Can-Occur-Random-Number-Of-Times>
      <Weird-Elt_12309843028938> Data I need </Weird-Elt_12309843028938>
      <Weird-Elt_84509843323232> Data I need </Weird-Elt_84509843323232>

   </Parent>
   <Parent>
      <Child-Which-Can-Occur-Random-Number-Of-Times> Data </Child-Which-Can-Occur-Random-Number-Of-Times>
      <Weird-Elt_12309843028938> Data I need </Weird-Elt_12309843028938>
   </Parent>
   <Parent>
      <Child-Which-Can-Occur-Random-Number-Of-Times> Data </Child-Which-Can-Occur-Random-Number-Of-Times>
      <Weird-Elt_12309843028938> Data I need </Weird-Elt_12309843028938>
   </Parent>
</DocumentRoot>

What I need : The name of the "Weird-Elt" tag, and it's contents.

Problem : XML cannot be changed. Weird-Elt element can occur a random number of times, as can the Element above it, Child-Which-Can-Occur-Random-Number-Of-Times.

The only solution I can see is to use Linq to XML in conjunction with a regular expression to match the name of Weird-Elt.

Am I right in this ?

Simon Kiely
  • 5,880
  • 28
  • 94
  • 180
  • Why can't the syntax be changed? If your XML can't be parsed by a `XML parser`, your design is flawed. – Msonic Apr 18 '12 at 17:10
  • 2
    http://stackoverflow.com/a/1732454/63011 I guess the same apply to XML – Paolo Moretti Apr 18 '12 at 17:11
  • No, you should not need regexes. What about accessing "parent".lastChild? – Bergi Apr 18 '12 at 17:11
  • @Bergi In his example, there are more than 1 `` tag per parent, so using lastChild will not work. – Msonic Apr 18 '12 at 17:12
  • 1
    How do you know which element you want? – SLaks Apr 18 '12 at 17:12
  • @SLaks, you know because it is of a certain format and because it is a child of that is not – Simon Kiely Apr 18 '12 at 17:15
  • @Msonic: Yes, but every selection expression can be described without regexes. Do you want "all elements after those which are named like the firstchild"? – Bergi Apr 18 '12 at 17:15
  • @Bergi I want to get the contents of the Weird-Elt element, which can occur a random number of times - but I also wish to extract the name of this element. – Simon Kiely Apr 18 '12 at 17:17
  • Yes. Take the children, remove the first and all following which have the same tagname as the first had out of your set, and you have the weird elements left to acess their tagnames and contents. – Bergi Apr 18 '12 at 17:20
  • Change your title to something more appropriate. The way it sounds now is only attracting downvotes. And your question is not about parsing XML with regex, its about finding a tag with a weird name that is following a pattern. – stema Apr 19 '12 at 09:36
  • @PaoloMoretti what do you want to achieve with that link? Have you read the question? It has nothing to do with that "parse XML with regex" questions. – stema Apr 19 '12 at 09:39
  • @steva So you are suggesting that a question entitled _Is it ever appropriate to parse XML with regular expressions_ has nothing to do with _parsing XML with regex_. That's interesting :-) – Paolo Moretti Apr 19 '12 at 09:50

2 Answers2

4
var nodeList = xmlDoc.DocumentElement.SelectNodes("//*[starts-with(name(),'Weird-Elt_')]");

if name not always start with Weird-Elt_ try contains

var nodeList = xmlDoc.DocumentElement.SelectNodes("//*[contains(name(),'Weird-Elt_')]");
Damith
  • 62,401
  • 13
  • 102
  • 153
3

Yes, you are correct. You will have to use Linq to SQL with RegularExpression. Here is the sample

Regex regEx = new Regex("Weird-Elt_.*", RegexOptions.Compiled);

XDocument doc = XDocument.Parse(xml1);
var x1 = from e in doc.Descendants("Parent").Descendants()
         where regEx.IsMatch(e.Name.LocalName)
        select e;

Hope it helps.

Cinchoo
  • 6,088
  • 2
  • 19
  • 34