9

using VB.net or c#, How do I get the generated HTML source?

To get the html source of a page I can use this below but this wont get the generated source, it won't contain any of the html that was added dynamically by the javascript in the browser. How do I get the the final generated HTML source?

thanks

WebRequest req = WebRequest.Create("http://www.asp.net"); 
WebResponse res = req.GetResponse(); 
StreamReader sr = new StreamReader(res.GetResponseStream()); 
string html = sr.ReadToEnd();

if I try this below then it returns the document with out the JavaScript code injected

Public Class Form1

    Dim WB As WebBrowser = Nothing

    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load

        WB = New WebBrowser()
        Me.Controls.Add(WB)
        AddHandler WB.DocumentCompleted, AddressOf WebBrowser1_DocumentCompleted


        WB.Navigate("mysite/Default.aspx")

    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)


        'Dim htmlcode As String = WebBrowser1.Document.Body.OuterHtml()
        Dim s As String = WB.DocumentText

    End Sub
End Class

HTML returned

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>

</head>
<body>
    <form id="form1" runat="server">
    <div id="center_text_panel">
    //test text  this text should be here
    </div>
    </form>
</body>
</html>

    <script type="text/javascript">

        document.getElementById("center_text_panel").innerText = "test text";


    </script>
Hello-World
  • 9,277
  • 23
  • 88
  • 154

3 Answers3

2

You can use WebKit.NET

Look here for official tutorials

This can not only grab the source, but also process javascript through the pageload event.

webKitBrowser1.Navigate(MyURL)

Then, handle the DocumentCompleted event, and:

private documentContent = webKitBrowser1.DocumentText

Edit - This might be the better open source WebKit option: http://code.google.com/p/open-webkit-sharp/

Brian Webster
  • 30,033
  • 48
  • 152
  • 225
1

Just put a webbrowser control to your form and you flowing code:

 webBrowser1.Navigate("YourLink");

     private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
           string htmlcode= webBrowser1.Document.Body.InnerHtml;//Or Each Filed Or element..//WebBrowser.DocumentText
        }

Edited

for getting also html code that generated dynamically by java script code you have two way:

  1. run flowing code after webBrowser1_DocumentCompleted Event
 StringBuilder htmlcode = new StringBuilder();
            foreach (HtmlElement item in webBrowser1.Document.All)
            {
                htmlcode.Append( item.InnerHtml);
            }
  1. write a javascript code for returning document.documentElement.innerHTML and using InvolkeScript Function To Return Result:
   var htmlcode = webBrowser1.Document.InvokeScript("javascriptcode");
KF2
  • 9,887
  • 8
  • 44
  • 77
  • thanks that s great but it returns the source not the generated souce – Hello-World Feb 13 '13 at 07:31
  • For Getting dynamically Generated Code You must Using extra JavaScript Code if You if add more details of something that you want to do it will show solution(or adding some extra code) – KF2 Feb 13 '13 at 07:34
  • Hi - WebBrowser.DocumentText needs to return the generated html code with the javascript injected into it. Do you think that this might need to be done as async. thanks for your help. – Hello-World Feb 13 '13 at 08:20
  • The string returned is just HTML before javascript runs. – nguyenhoai890 Mar 23 '17 at 10:28
0

You can use this code:

webBrowser1.Document.Body.OuterHtml
Bugs
  • 4,491
  • 9
  • 32
  • 41
ngochoaitn
  • 27
  • 4