I'm using Delphi's TWebBrowser component to load up some web pages that I want to parse, and they use javascript (AJAX?) to render the user-visible HTML code. The well-documented methods of extracting the HTML from such pages returns a bunch of javascript rather than what the user sees. There are responses to queries here that go back to 2004 and they all return javascript rather than the user-visible HTML. I've seen a couple that suggest alternate ways to access the data, but I have not been able to get any of them to work, nor am I sure how to adapt the code.
My question is, when I load a web page into a TWebBrowser that's perfectly readable after being rendered inside of the TWebBrowser component, how can I extract the HTML that's ultimately rendered inside of that component that makes it visible, rather than the JS code that generates it?
In my case, I'm trying to load a Google Search Result page, but I've heard this is also an issue in lots of news sites like Wall Street Journal, WAPO, and NYTimes.
var
url: string;
d: OleVariant;
begin
// enter something like "dentist in baltimore" in a Google search,
// then copy the contents of the ADDRESS field that it generates and
// paste it here:
url := '... paste URL Google generates here ...';
WebBrowser1.Navigate2( url, 0 {nav_flags} );
// I have an OnNavigate2 handler here, but I'm guessing this works as well
d := WebBrowser1.Document;
memo1.Lines.Text := d.documentElement.outerHTML;
The problem is, the memo contains ... and it's just a bunch of javascript in the HEAD. There's nothing there that resembles what's visible in the TWebBrowser or browser window that this search actually displays to the user.