1

Update

I want to have an expression (XPath, or Regex Expression, similar) that can match an XML element with a particular namespace. For example, I want to locate the value of the link element (e.g. I need the http://url within <b:link>http://url</b:link>) shown below. However, the namespace prefix varies depending on different xml files as shown in cases 1-3.

Considering the allowed character for namespace prefix (e.g. is any character allowed/valid) , could anyone provide the solution (XPath, Regex Expression or similar?

Please note that because the xml file is unknown, thus, the namespace and prefix are unknown until runtime. Does it mean I cannot use this XDocument/XmlDocument, because it requires namespace to be known in the code.

Update

Case 1

<A xmlns:b="link">
<b:link>http://url
</b:link>
</A>

Case 2

<A xmlns="link">
<link>http://url
</link>
</A>

Case 3

<A xmlns:a123="link">
<a123:link>http://url
</a123:link>
</A>

Please note that the url within the link element could be any http url, and unknown until runtime.

Update

Please mark up my question.

Pingpong
  • 7,681
  • 21
  • 83
  • 209

2 Answers2

6

You need to know the namespaces you will be dealing with and register them with an XmlNamespaceManager. Here is an example:

    XmlDocument doc = new XmlDocument();
    doc.LoadXml("<A xmlns:b='link'><b:Books /></A>");
    XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
    nsmgr.AddNamespace("b", "link");

    XmlNodeList books = doc.SelectNodes("//b:Books", nsmgr);

And if you want to do this using XDocument, which I would recommend for its brevity, here is how:

    XDocument xDoc = XDocument.Parse("<A xmlns:b='link'><b:Books /></A>");
    XNamespace ns = "link";
    var books = xDoc.Descendants(ns + "Books");

If you do not know the namespace(s) ahead of time, see this post which shows how to query across an XDocument using only the local name. Here's an example:

XDocument xDoc = XDocument.Parse("<A xmlns:b='link'><b:Books /></A>");
var books = xDoc.Descendants().Where(e => e.Name.LocalName.ToLower() == "books");
Community
  • 1
  • 1
chase huber
  • 794
  • 5
  • 12
  • Because the xml file is unknown, thus, the namespace and prefix are unknown until runtime. Does it mean I cannot use this way.? – Pingpong Jan 25 '12 at 11:40
  • Please see my updated answer, it should work when you don't know the namespaces. XDocument is a nice API for processing Xml. – chase huber Jan 25 '12 at 20:12
1

Use an XML parser, not a regex.

That being said, you could use:

<(?:(.+?):)?Books />

And the namespace would be in captured group 1.

In fact, I'd more strongly recommend you use

<(?:([^<>]+?):)?Books />

To prevent mistakes like matching over another set of XML tags (who would use <> in a namespace anyway?!)

mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
  • If you have XDocument/XMLDocument then regex is entirely unnecessary, that's the point of such parsers. I'm not a C# guru so someone else will have to help you out on the usage of that class. – mathematical.coffee Jan 25 '12 at 00:56
  • I changed the element to http://url here , rather than , what changes need to be made? Please refer to my updated post on the cases. Thanks – Pingpong Jan 25 '12 at 13:50
  • What do you want to extract? the URL? (your question still says you want `Book` tags). I'll give you a regex to do that if you want, but I'd recommend @GemCer's answer over mine as being much, *much*, **much** better for parsing XML than regex. – mathematical.coffee Jan 26 '12 at 13:07
  • Thanks! @GemCer has a better solution. – Pingpong Jan 26 '12 at 17:38