Can a website's generated HTML be saved using Canopy? Looking at the documentation under 'Getting Started', I could not find anything related.
2 Answers
You can run arbitrary JavaScript using js
, document.documentElement.outerHTML
will return the current DOM, so
let html = js "return document.documentElement.outerHTML" |> string
does the trick.

- 8,391
- 1
- 24
- 43
-
While your answer looks and behaves promising, the problem is that it returns the generated HTML code of the default WebDriver page, i.e. "
WebDriver ˛..." and not that of the specified URL, which seems to take some time to load. Inelegantly adding some `Thread.Sleep` and looping to wait for the target HTML to load throws `NoSuchWindowException`. – bugfoot Nov 30 '16 at 21:26 -
Unfortunately there is no (single) solution to that problem. In general, you cannot tell whether a page (including JavaScript) has finished loading / executing or whether there are some actions waiting (e.g. through `setTimeout`) or even just infinite animations. If you know the page you can wait for some indicators: see `waitFor`, `waitForElement`, e.g. `waitFor (fun () -> js "return document.readyState" |> string = "complete")` Anyways, this should be a separate question (probably going to be closed as duplicate). – CaringDev Nov 30 '16 at 22:13
-
In fact I overlooked the [required configuration for IE](https://github.com/SeleniumHQ/selenium/wiki/InternetExplorerDriver#required-configuration). With everything set, and your `waitFor` code, it can indeed save the _generated_ HTML. Much thanks :-) – bugfoot Dec 02 '16 at 07:04
Canopy is a wrapper around Selenium that provides some useful helper functions. But it also provides access to the Selenium IWebElement
instances in case you need them, via the element
function (halfway down the page; there don't seem to be internal anchors in that page so I couldn't link directly to the function). Then once you have the IWebElement
object, your problem becomes similar to this one, where the answer seems to be elem.getAttribute("innerHtml")
where elem
is the elememt whose content you want (which might even be the html
element). Note that the innerHtml
attribute is not a standard DOM attribute, so this won't work with all Selenium drivers; it will be dependent on which browser you're running in. But it apparently works on all major Web browsers.
See Get HTML Source of WebElement in Selenium WebDriver using Python for a related question using Python, which has more discussion about whether the innetHtml
attribute will work in all browsers. If it doesn't, Canopy also has the js
function, which you could leverage to run some Javascript to get the HTML you're looking for -- but if you're having trouble with that, you probably need to ask a Javascript question rather than an F# question.
-
When I try to add `let html = element "html" |> string`, it throws `NoSuchWindowException`, which is strange, since there should be an html element in the generated HTML page. – bugfoot Nov 30 '16 at 21:35
-
What happens when you load that page in a browser, then open up the Developer Tools to look at its contents? Does it have an "html" element? – rmunn Nov 30 '16 at 23:02
-
Also, why are you doing `|> string` there? The idea is to get hold of the `IWebElement` **object** and use its properties; converting it to a string seems counterproductive. – rmunn Nov 30 '16 at 23:03