4

I've got some XML (valid XHTML) that looks like this:

<html>
    <head>
        <script type="text/javascript">
            <![CDATA[
                function change_header(){
                    document.getElementById("myHeader").innerHTML="Nice day!";
                }]]>
        </script>
    </head>
    <body>
        <h1 id="myHeader">Hello World!</h1>
        <button onclick="change_header()">Change text</button>
    </body>
</html>

And I'm trying to get the #myHeader node using docment.GetElementById("myHeader") but it always returns null. Why?

I'm guessing it doesn't recognize the id attribute as the id attribute without a DTD or something? If that's the case, how can I get it to use an HTML DTD?

Tajkia Rahman Toma
  • 472
  • 1
  • 5
  • 16
mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • 1
    Same as [ GetElementById() not finding the tag? ](http://stackoverflow.com/questions/2003185/getelementbyid-not-finding-the-tag). – Matthew Flaschen Sep 23 '10 at 06:09
  • Matthew, I don't think it's the same. This one works by just removing the CDATA in both firefox and chrome. – Gonzalo Sep 23 '10 at 06:20
  • 1
    @Gonzalo: You've completely misinterpreted the question. Have a look at the tags. This has nothing to do with JavaScript. I'm trying to parse the HTML in C#. – mpen Sep 23 '10 at 06:36

1 Answers1

7

It's because XmlDocument knows nothing about what an id means. You need to include a DTD in your XHTML document. Just put the following in the beginning of your html file:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Example:

string html = @"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd""><html><body><div id=""foo"">some content</div></body></html>";
XmlDocument document = new XmlDocument();
document.LoadXml(html);
XmlElement div = document.GetElementById("foo");

Notice that this might be a little slower because the DTD needs to be downloaded.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • The document is coming from the web in the form of a stream. Is there another way to set the doctype? – mpen Sep 23 '10 at 06:12
  • I am afraid you will need to load it into memory, append the correct DTD and then load it into an XmlDocument. Of course if you intend to parse HTML I would recommend you using [Html Agility Pack](http://htmlagilitypack.codeplex.com/). Using XmlDocument for parsing invalid web pages (ones without DTD for example) is a perilous task. – Darin Dimitrov Sep 23 '10 at 06:13
  • Trying `SgmlReader` instead. Wasn't too fond of HtmlAgilityPack. – mpen Sep 23 '10 at 06:33
  • I'm marking this as the accepted answer to close this thread, but I went to look for the "id" attribute instead because it's easier. – mpen Sep 25 '10 at 04:59