1

I wanted to use HTMLDocument object from mshtml library. I was trying to assign HTML to document:

var doc = new mshtml.HTMLDocument();
var html = File.ReadAllText(@"path_to_html_file");
doc.body.innerHTML = html; // <-- this line throws error

However, I get error on the third line:

System.NullReferenceException: 'Object reference not set to an instance of an object.'
mshtml.DispHTMLDocument.body.get returned null.

I was trying to use dynamic code, but it didn't work either:

dynamic doc = Activator.CreateInstance(Type.GetTypeFromProgID("htmlfile"));

In this case I get the following error:

Microsoft.CSharp.RuntimeBinder.RuntimeBinderException:
'Cannot perform runtime binding on a null reference'

Is there some solution to overcome this problem? Thanks!

UPDATE: VBA code

Sub GetData()
    Dim doc As MSHTML.HTMLDocument
    Dim fso As FileSystemObject, txt As TextStream

    Set doc = New MSHTML.HTMLDocument
    Set fso = New FileSystemObject
    Set txt = fso.OpenTextFile("path_to_html_file")
    doc.body.innerHTML = txt.ReadAll() '// <-- No error here
    txt.Close
End Sub
Jimi
  • 29,621
  • 8
  • 43
  • 61
JohnyL
  • 6,894
  • 3
  • 22
  • 41

2 Answers2

1

You could cast the mshtml.HtmlDocument to the IHTMLDocument2 interface, to have the main objects' properties and methods available:

var doc = (IHTMLDocument2)new mshtml.HTMLDocument();

Or create a HtmlDocumentClass instance using Activator.CreateInstance() with the Type Guid, then cast to a IHTMLDocument2 Interface.

IHTMLDocument2 doc = 
   (IHTMLDocument2)Activator.CreateInstance(
       Type.GetTypeFromCLSID(new Guid("25336920-03F9-11CF-8FD0-00AA00686F13")));

It's more or less the same thing. I'ld prefer the first one, mainly for this reason

Then you can write to the HtmlDocument whatever you want. For example:

doc.write(File.ReadAllText(@"[Some Html Page]"));
Console.WriteLine(doc.body.innerText);

To create a HtmlDocument, a skeleton HTML Page is enough, something like this:

string html = "<!DOCTYPE html><html><head></head><Body><p></body></html>";
doc.write(html);

Note: before a Document is created, all elements in the page will be null.

After, you can set the Body.InnerHtml to something else:

doc.body.innerHTML = "<P>Some Text</P>";
Console.WriteLine(doc.body.innerText);

Note that if you need to work with HTML Document more extensively, you'll have to cast to a higher level interface: IHTMLDocument3 to IHTMLDocument8 (as of now), depeding on the System version.

The classic getElementById, getElementsByName, getElementsByTagName methods are availble in the IHTMLDocument3 interface.

For example, use the getElementsByTagName() to retrieve the InnerText of an HTMLElement using it's tag name:

string innerText = 
   (doc as IHTMLDocument3).getElementsByTagName("body")
                          .OfType<IHTMLElement>().First().inne‌​rText;

Note:
If you can't find the IHTMLDocument6, IHTMLDocument7 and IHTMLDocument8 interfaces (and possibly other interfaces referenced in the MSDN Docs), then you probably have an old Type library in the \Windows\Assembly\ GAC. Follow Hans Passant's advices to create a new Interop.mshtml library:
How to get mshtml.IHTMLDocument6 or mshtml.IHTMLDocument7?

Jimi
  • 29,621
  • 8
  • 43
  • 61
  • Thanks for answer, but unfortunately IntelliSense shows maximum version `IHTMLDocument5` - and this version doesn't have neither `body` property nor `getElementById` method. Also, the `Activator.CreateInstance` didn't help either - error says that `body` is null. – JohnyL Jan 27 '19 at 16:13
  • Cast to `IHtmlDocument2`, as shown in the example. The other interfaces offer different methods and properties. See the Docs about what's available per Interface. – Jimi Jan 27 '19 at 16:14
  • Although `IHtmlDocument2` supports `body`, but it's null again. Moreover, it doesn't support `getElementById`! :) – JohnyL Jan 27 '19 at 16:17
  • I guess, I need to stick with VBA or use HtmlAgilityPack :) – JohnyL Jan 27 '19 at 16:17
  • What System and Framework/mshtml versions are you using?. This is quite a classic setup. + Did you `.write` to the doc before setting anything? Do you have a `using mshtml;` in code? – Jimi Jan 27 '19 at 16:19
  • Note, from the sample code, that `Body` is `null` unless you write some HTML into the Document. You can't set the Body to anything if the Document doesn't contain an HTML page. A simple skeleton of the page (which should start with ` (...)`) is enough. – Jimi Jan 27 '19 at 16:26
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/187392/discussion-between-johnyl-and-jimi). – JohnyL Jan 27 '19 at 16:30
1

I faced with the System.NullReferenceException too, because the doc.body was null. Finally, I resolved the problem in this way:

   public void SetWebBrowserHtml(WebBrowser webBrowser, string html)
    {
        if (!(webBrowser.Document is MSHTML.IHTMLDocument2))
        {
            webBrowser.Navigate("about:blank");
        }
        if (webBrowser.Document is MSHTML.IHTMLDocument2 doc)
        {
            if (doc.body == null)
            {
                doc.write(html);
            }
            else
            {
                doc.body.innerHTML = html;
            }
        }
    }
Alex Vazhev
  • 1,363
  • 1
  • 18
  • 17