13

How can I get source code from WebBrowser component?

I want to get source code of active page on WebBrowser component and write it to a Memo component.

Thanks.

Someone
  • 728
  • 2
  • 12
  • 23

4 Answers4

21

You can use the IPersistStreamInit Interface and the save method to store the content of the Webbrowser in a Stream.

Uses 
  ActiveX;

function GetWebBrowserHTML(const WebBrowser: TWebBrowser): String;
var
  LStream: TStringStream;
  Stream : IStream;
  LPersistStreamInit : IPersistStreamInit;
begin
  if not Assigned(WebBrowser.Document) then exit;
  LStream := TStringStream.Create('');
  try
    LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
    Stream := TStreamAdapter.Create(LStream,soReference);
    LPersistStreamInit.Save(Stream,true);
    result := LStream.DataString;
  finally
    LStream.Free();
  end;
end;
RRUZ
  • 134,889
  • 20
  • 356
  • 483
  • How can we make it work the REVERSE way: SetWebBrowserHTML, thus re-injecting the previously extracted code back to WebBrowser (or TEmbeddedWebBrowser). I imagine the following situation: A memo component gets the HTML source code with GetWebBrowserHTML, then the user makes some changes to the source code, then the changed source code is re-injected back into WebBrowser. This would make a nice HTML editor with real-time preview in the browser! – user1580348 May 14 '13 at 01:12
  • 2
    Better: `LStream := TStringStream.Create('', TEncoding.UTF8);` – user1580348 May 20 '13 at 10:17
  • @user1580348If you wanted to "reverse" it, all you need to change is LPersistStreamInit.Save to LPersistStreamInit.Load and initialize the TStringStream with something (or pass in a different stream). – tmjac2 May 06 '16 at 22:40
6

That works well too:

    uses MSHTML;

    function GetHTML(w: TWebBrowser): String;
    Var
      e: IHTMLElement;
    begin
      Result := '';
      if Assigned(w.Document) then
      begin
         e := (w.Document as IHTMLDocument2).body;
    
         while e.parentElement <> nil do
         begin
           e := e.parentElement;
         end;
    
         Result := e.outerHTML;
      end;
    end;
Robert Christopher
  • 4,940
  • 1
  • 20
  • 21
Mehmet Fide
  • 1,643
  • 1
  • 20
  • 35
  • Wrong. this will get you the DOM representation of the `document` element. It will not be the HTML source code. – kobik Mar 18 '13 at 19:36
  • Yes you are right, I was using it just to parse some data available on html source and using DOM representation was ok for that. – Mehmet Fide Mar 20 '13 at 03:42
  • 1
    I'll upvote your answer, It's useful in any case. I also use a similar method in our spider to manipulate/parse HTML from a foreign web site. – kobik Mar 20 '13 at 13:00
  • 1
    I had to up vote this because the page I was trying to get source code had the content changed by JavaScript, so @rruz suggestion didn't work as it returned the original HTML instead of the changed one. Thank you. – Fernando M. Pinheiro Mar 20 '16 at 17:10
3

This has been asked and answered many times in the Embarcadero forums, with plenty of code examples posted. Search the archives.

The gist of it is that you Navigate() to the desired URL and wait for the OnDocumentComplete event to fire, then QueryInterface() the Document property for the IPersistStreamInit interface and call its save() method. Create a TStream object instance, such as a TMemoryStream, wrap it in a TStreamAdapter object, and then pass the adapter to save(). You can then load the TStream into the TMemo as needed.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
2

Why not Quick and Dirty:

OnNavigateComplete2()

Form1.RichEdit1.Text:=(WebBrowser1.OleObject.Document.documentElement.outerhtml);
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83