0

I'm developing an web application that monitor changes in other websites. I came across some of the websites and that contain load of Frame set and Frames.

I'm using the below code:

  var chromeOption = new ChromeOptions();
        chromeOption.AddArgument("--headless");
        Console.WriteLine("Getting into the Application");
        using (var driver = new ChromeDriver(chromeOption))
        {
            Console.WriteLine("Loading the Web Page");

            driver.Navigate().GoToUrl("http://www.xyz.dk/");
            var htmltxt = driver.PageSource;
        }

The Page Source return me:

<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml"><head>
    <title>Mr X. Consulting</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />    
</head>
<frameset cols="25%,50%,25%" frameborder="0">
  <frame src="border.html" />
  <frame src="jjc.html" />
  <frame src="border.html" />
</frameset>

The PageSource is not loading the frame source. I have searched a lot online, even here in vain but didn't get useful info.

My question is how can I load all the frames and get the whole page source like below (only from Inspect element from Chrome)

Thanks Heveen

2 Answers2

0

I don't think that you can (easily) get the combined page source as you see it in "Inspect Element", as each frame has its own source, similar to what you see when you right click on a page a select View Page Source or View Frame Source.

You can however get all of the page sources of all the frames by traversing them recursively as follows:

    private static List<string> GetAllSources(IWebDriver driver)
    {
        var sources = new List<string>();
        driver.SwitchTo().DefaultContent();
        AddFrameSources(driver, sources);
        return sources;
    }

    private static void AddFrameSources(IWebDriver driver, List<string> sources)
    {
        sources.Add(driver.PageSource);
        var frames = driver.FindElements(By.TagName("frame"));
        var iframes = driver.FindElements(By.TagName("iframe"));
        foreach (var frame in frames.Union(iframes))
        {
            driver.SwitchTo().Frame(frame);
            AddFrameSources(driver, sources);
            driver.SwitchTo().ParentFrame();
        }
    }
Arnon Axelrod
  • 1,444
  • 2
  • 13
  • 21
0

When you invoke a url through Navigate().GoToUrl() Selenium's focus remains on the Top Level Browsing Context. Hence in your very next step as you are invoking PageSource the HTML of the Top Level Browsing Context is displayed along with the presence of various available <frame> tags.

As per the Page Source you have provided for demonstration perhaps you have trimmed the attributes of the <frame> tags. You can retrieve the HTML of the frames through switching to the individual frames following the code block below :

driver.Navigate().GoToUrl("http://www.xyz.dk/");
Console.WriteLine("HTML of Top Level Browsing Context : ");
Console.WriteLine(driver.PageSource);
driver.SwitchTo().Frame(driver.FindElement(By.XPath("//frame[@src='border.html']")));
Console.WriteLine("HTML of border frame : ");
Console.WriteLine(driver.PageSource);
driver.SwitchTo().ParentFrame();
driver.SwitchTo().Frame(driver.FindElement(By.XPath("//frame[@src='jjc.html']")));
Console.WriteLine("HTML of jjc frame : ");
Console.WriteLine(driver.PageSource);
driver.SwitchTo().ParentFrame();
driver.SwitchTo().Frame(driver.FindElement(By.XPath("//frame[@src='border.html']")));
Console.WriteLine("HTML of border frame : ");
Console.WriteLine(driver.PageSource);

Notes

This solution is just a vanila solution to emphasize on how to aquire the HTML within a frame. Further you can implement the following enhancements :

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352