6

I am using .net core 3.1 and Puppeteer Sharp 2.0.4. I want to get the full page HTML from a web page after the JavaScript has finished running. This is my code:

await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
    Headless = false
});
var page = await browser.NewPageAsync();
page.DefaultTimeout = 0;
var navigation = new NavigationOptions
{
    Timeout = 0,
    WaitUntil = new[] {
        WaitUntilNavigation.DOMContentLoaded }
};
await page.GoToAsync("https://someurl", navigation);
content = await page.GetContentAsync();

It looks like the content variable does not have the HTML after the JS finished running. Any advice on what I should change to make it work?

Josh Correia
  • 3,807
  • 3
  • 33
  • 50
DinaF
  • 101
  • 2
  • 7
  • First, you don't need SetJavaScriptEnabledAsync. Second, you should find a way to know that the page is ready. Is there any element that is there once is ready? a javascript variable? – hardkoded Jan 30 '21 at 19:23
  • 1
    I have found this element WaitUntilNavigation.DOMContentLoaded. I have edited my code accordingly and it works. Thanks a lot! – DinaF Feb 05 '21 at 19:11
  • 1
    Hi, could you please share the solution ? – fizmhd May 06 '21 at 13:17
  • I have edited the code in the answer, the code there now is the one that works. – DinaF May 29 '21 at 12:37

1 Answers1

1

Just replacing navigation with WaitUntilNavigation.Networkidle2 worked to wait until Javascript is finished to excute.

using PuppeteerSharp;

await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
    Headless = true // false if you need to see the browser
});
var page = await browser.NewPageAsync();
page.DefaultTimeout = 5000; // or you can set this as 0
await page.GoToAsync("https://www.google.com", WaitUntilNavigation.Networkidle2);
var content = await page.GetContentAsync();

Console.WriteLine(content);
sappho192
  • 11
  • 4