How can I get HTML source code from TWebBrowser

Question

How can I get source code from WebBrowser component?

I want to get source code of active page on WebBrowser component and write it to a Memo component.

Thanks.

score 21 · Accepted Answer · answered Apr 10 '12 at 15:40

21

You can use the IPersistStreamInit Interface and the save method to store the content of the Webbrowser in a Stream.

Uses 
  ActiveX;

function GetWebBrowserHTML(const WebBrowser: TWebBrowser): String;
var
  LStream: TStringStream;
  Stream : IStream;
  LPersistStreamInit : IPersistStreamInit;
begin
  if not Assigned(WebBrowser.Document) then exit;
  LStream := TStringStream.Create('');
  try
    LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
    Stream := TStreamAdapter.Create(LStream,soReference);
    LPersistStreamInit.Save(Stream,true);
    result := LStream.DataString;
  finally
    LStream.Free();
  end;
end;

answered Apr 10 '12 at 15:40

RRUZ

134,889
20
356
483

How can we make it work the REVERSE way: SetWebBrowserHTML, thus re-injecting the previously extracted code back to WebBrowser (or TEmbeddedWebBrowser). I imagine the following situation: A memo component gets the HTML source code with GetWebBrowserHTML, then the user makes some changes to the source code, then the changed source code is re-injected back into WebBrowser. This would make a nice HTML editor with real-time preview in the browser! – user1580348 May 14 '13 at 01:12
2

Better: `LStream := TStringStream.Create('', TEncoding.UTF8);` – user1580348 May 20 '13 at 10:17
@user1580348If you wanted to "reverse" it, all you need to change is LPersistStreamInit.Save to LPersistStreamInit.Load and initialize the TStringStream with something (or pass in a different stream). – tmjac2 May 06 '16 at 22:40

score 6 · Answer 2 · edited Sep 11 '20 at 03:18

6

That works well too:

    uses MSHTML;

    function GetHTML(w: TWebBrowser): String;
    Var
      e: IHTMLElement;
    begin
      Result := '';
      if Assigned(w.Document) then
      begin
         e := (w.Document as IHTMLDocument2).body;
    
         while e.parentElement <> nil do
         begin
           e := e.parentElement;
         end;
    
         Result := e.outerHTML;
      end;
    end;

edited Sep 11 '20 at 03:18

Robert Christopher

4,940
1
20
21

answered Mar 18 '13 at 19:25

Mehmet Fide

1,643
1
20
35

Wrong. this will get you the DOM representation of the `document` element. It will not be the HTML source code. – kobik Mar 18 '13 at 19:36
Yes you are right, I was using it just to parse some data available on html source and using DOM representation was ok for that. – Mehmet Fide Mar 20 '13 at 03:42
1

I'll upvote your answer, It's useful in any case. I also use a similar method in our spider to manipulate/parse HTML from a foreign web site. – kobik Mar 20 '13 at 13:00
1

I had to up vote this because the page I was trying to get source code had the content changed by JavaScript, so @rruz suggestion didn't work as it returned the original HTML instead of the changed one. Thank you. – Fernando M. Pinheiro Mar 20 '16 at 17:10

score 3 · Answer 3 · answered Apr 10 '12 at 15:40

This has been asked and answered many times in the Embarcadero forums, with plenty of code examples posted. Search the archives.

The gist of it is that you Navigate() to the desired URL and wait for the OnDocumentComplete event to fire, then QueryInterface() the Document property for the IPersistStreamInit interface and call its save() method. Create a TStream object instance, such as a TMemoryStream, wrap it in a TStreamAdapter object, and then pass the adapter to save(). You can then load the TStream into the TMemo as needed.

score 2 · Answer 4 · edited Oct 06 '20 at 15:28

2

Why not Quick and Dirty:

OnNavigateComplete2()

Form1.RichEdit1.Text:=(WebBrowser1.OleObject.Document.documentElement.outerhtml);

edited Oct 06 '20 at 15:28

Adrian Mole

49,934
160
51
83

answered Oct 06 '20 at 14:48

MasterDiesel

21
1

This simple version works much better on UTF-8 encoded pages with non-ASCII text. – Kevin Davidson Mar 29 '21 at 13:53

How can I get HTML source code from TWebBrowser

4 Answers4

Linked