10

I sucessfully display a web site on WebView2 in my VB.net (Visual Studio 2017) project but can not get html souce code. Please advise me how to get html code.

My code:

Private Sub testbtn_Click(sender As Object, e As EventArgs) Handles testbtn.Click
        WebView2.CoreWebView2.Navigate("https://www.microsoft.com/")
End Sub

Private Sub WebView2_NavigationCompleted(sender As Object, e As CoreWebView2NavigationCompletedEventArgs) Handles WebView2.NavigationCompleted
        Dim html As String = ?????
End Sub

Thank you indeed for your advise in advance.

Andrew Mortimer
  • 2,380
  • 7
  • 31
  • 33
Tom
  • 123
  • 1
  • 1
  • 7
  • I've never used a `WebView2` control and there seems to be little information around about this but I suspect that it starts [here](https://learn.microsoft.com/en-us/microsoft-edge/webview2/reference/dotnet/0-9-538/microsoft-web-webview2-core-corewebview2#getdevtoolsprotocoleventreceiver). I think the reason that it's not well documented is that it's part of Chromium. – jmcilhinney Jun 17 '20 at 15:06
  • 1
    Does this answer your question? [How I get page source from WebView?](https://stackoverflow.com/questions/9966760/how-i-get-page-source-from-webview) – J. Scott Elblein Jun 17 '20 at 19:18
  • Also, https://stackoverflow.com/questions/29654149/get-source-code-from-webview-vb-for-metro – J. Scott Elblein Jun 17 '20 at 19:18
  • Thank you indeed. I have read through the document but still can not find the answer. I also tried the link "https://stackoverflow.com/questions/29654149/get-source-code-from-webview-vb-for-metro" but unfortunately "Await myWebView.InvokeScriptAsync" is marked error and does not work. – Tom Jun 17 '20 at 22:56

4 Answers4

30

I've only just started messing with the WebView2 earlier today as well, and was just looking for this same thing. I did manage to scrape together this solution:

Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")

' The Html comes back with unicode character codes, other escaped characters, and
' wrapped in double quotes, so I'm using this code to clean it up for what I'm doing.
html = Regex.Unescape(html)
html = html.Remove(0, 1)
html = html.Remove(html.Length - 1, 1)

Converted my code from C# to VB on the fly, so hopefully didn't miss any syntax errors.

Xaviorq8
  • 316
  • 3
  • 2
  • 1
    Fantastic. Thank you indeed. I can acomplish getting html source code from WebView2 as in followoing code. I really appreciate for it. Private Sub testbtn_Click() Handles testbtn.Click wv.CoreWebView2.Navigate(""https://www.microsoft.com/"") End Sub Private Async Sub wv_NavigationCompleted() Handles wv.NavigationCompleted Dim html As String = String.Empty html = Await wv.ExecuteScriptAsync("document.documentElement.outerHTML;") html = Regex.Unescape(html) html = html.Remove(0, 1) html = html.Remove(html.Length - 1, 1) End Sub – Tom Jun 19 '20 at 05:20
  • But, what about simply invoking the "View Page Source" command in WebBiew2? Can we do that? I know we can display it via a hot key so why not "on demand"? This command would display the source in a popup window. – Andrew Truckle Apr 18 '22 at 18:54
  • Your answer was exactly what I needed. I was using WebBrowser.DocumentStream to load HtmlAgilityPack.HtmlDocument. Now I am converting to WebView2 and I could not get a valid document into HtmlAgilityPack. Your answer solved the problem. Perhaps I will post my code tomorrow in case it would help someone. – Ken Smith Sep 28 '22 at 05:52
3

Adding to @Xaviorq8 answer, you can use Span to get rid of generating new strings with Remove:

html = Regex.Unescape(html)
html = html.AsSpan()[1..^1].ToString();
JohnyL
  • 6,894
  • 3
  • 22
  • 41
1

I must credit @Xaviorq8; his answer was needed to solve my problem. I was successfully using .NET WebBrowser and Html Agility Pack but I wanted to replace WebBrowser with .NET WebView2.

Snippet (working code with WebBrowser):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.Load(webBrowser1.DocumentStream);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[@id=\"apptAndReportsTbl\"]");
Snippet (failing code with WebView2):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[@id=\"apptAndReportsTbl\"]");

Success withWebView2 and Html Agility Pack

using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
// thanks to @Xaviorq8 answer (next 3 lines)
html = Regex.Unescape(html);
html = html.Remove(0, 1);
html = html.Remove(html.Length - 1, 1);
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[@id=\"apptAndReportsTbl\"]");
Ken Smith
  • 125
  • 5
0

The accepted answer is on the right track. However, it's missing on important thing:

The returned string is NOT HTMLEncoded, it's JSON!

So to do it right, you need to deserialize the JSON, which is just as simple:

Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
html = Await JsonSerializer.DeserializeAsync(Of String)(html);
Poul Bak
  • 10,450
  • 5
  • 32
  • 57