6

I have XHTML file which starts with:

<html xmlns="http://www.w3.org/1999/xhtml">

I load it:

XmlDocument xml = new XmlDocument();
StringReader sr = new StringReader(html);
XmlTextReader xmltr = new XmlTextReader(sr);
xmltr.Namespaces = false;
xml.Load(xmltr);

When I call xml.InnerXml I always got The 'xmlns' attribute is bound to the reserved namespace 'http://www.w3.org/2000/xmlns/'. exception and can't access inner xml of my XmlDocument. How to get rid of xmlns during load?

SOLUTION IS:

XmlNode xmln = xml.SelectSingleNode("//html");
if (xmln != null)
    ((XmlElement)xmln).RemoveAttribute("xmlns");
pnuts
  • 58,317
  • 11
  • 87
  • 139
Denis
  • 3,653
  • 4
  • 30
  • 43
  • Your XHTML declaration talks about `http://www.w3.org/1999/xhtml` but the error you've described talks about `http://www.w3.org/2000/xmlns` - are you sure they're both correct? – Jon Skeet Jan 27 '12 at 11:24
  • Yes, my third-party XHTML has 1999 and exception says 2000. – Denis Jan 27 '12 at 11:34
  • The more important bit is the "xhtml" vs "xmlns" bit... – Jon Skeet Jan 27 '12 at 11:36
  • Yes, and it does not work. You will always get exception when you try to touch this node in any way. But everything is okay for all other nodes. You still can get any inner node through XPath (as I do). How to fix it? – Denis Jan 27 '12 at 12:58
  • Okay, I've just reproduced the problem - and it goes away if you get rid of the "Namespaces = false" line. Why are you doing that? – Jon Skeet Jan 27 '12 at 13:17
  • I do not remember. But xml.SelectSingleNode("//title") does not work without that, currently I'm looking onto it. I appreciate your help if you know solution. – Denis Jan 27 '12 at 13:38
  • Well yes, you'd need to change your XPath to take account of namespaces... or use something else. Personally I prefer using LINQ to XML for XML work... are you able to use that instead? – Jon Skeet Jan 27 '12 at 13:43
  • I have pretty big amount of working code already, and XmlDocument fits my needs. I use SelectSingleNode to process some deep nodes. Currently I'm searching why XPath stopped to work. Still not found answer. – Denis Jan 27 '12 at 13:57
  • Well "//title" looks for a title element without a namespace. You need to search *with* namespaces - see http://stackoverflow.com/questions/561822/xpath-on-an-xml-document-with-namespace for example. – Jon Skeet Jan 27 '12 at 14:00
  • `XmlNode xmln = xml.SelectSingleNode("//html"); if (xmln != null) ((XmlElement)xmln).RemoveAttribute("xmlns");` did the trick. Thank you Jon. – Denis Jan 27 '12 at 14:46

1 Answers1

6

At a guess, the page that you are trying to parse has recently changed to XHTML, hence the namespaces?

As per @JonSkeet, you shouldn't set xmltr.Namespaces = false; on your XmlTextReader

You can either

  • embrace namespaces and use XmlNameSpaceManager to manage the XHTML (xmlns="http://www.w3.org/1999/xhtml") namespace.
  • use namespace agnostic xpath such as local-name(), which will ignore the namespace: *

 xml.SelectSingleNode("/*[local-name()='html']/*[local-name()='body']")

Either way, your code will need to change to adapt to the namespaces, unless you hack the namespace out of the XML before you load it.

* You can also use // with local-name() but be careful with documents with large numbers of elements - this can become very slow.

StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • 2
    Thank you, nonnb. I already got rid of namespace attribute using `((XmlElement)xmln).RemoveAttribute("xmlns");` – Denis Jan 27 '12 at 14:50