4

Hello Developers I want to read external content from Website such as element between tag . I am using Web Browser Control and here is my code however this Code just fills my Web browser control with the Web Page

public MainWindow()
{
    InitializeComponent();

    wbMain.Navigate(new Uri("http://www.annonymous.com", UriKind.RelativeOrAbsolute));
}
Rob
  • 4,927
  • 12
  • 49
  • 54
user1528573
  • 95
  • 1
  • 2
  • 9

3 Answers3

4

You can use the Html Agility Pack library to parse any HTML formatted data.

HtmlDocument doc = new HtmlDocument();
doc.Load(wbMain.DocumentText);

var nodes = doc.SelectNodes("//a[@href"]);

NOTE: The method SelectNode accepts XPath, not CSS or jQuery selectors.

var node = doc.SelectNodes("id('my_element_id')");
Tomislav Markovski
  • 12,331
  • 7
  • 50
  • 72
  • whenever i try to access Node i get an exception "Expression must evaluate to a node-set." – user1528573 Jul 19 '12 at 12:25
  • var nodes = doc.DocumentNode.SelectNodes("#text"); – user1528573 Jul 19 '12 at 12:55
  • 1
    Okay, you need to use XPath as selector, not the CSS or jQuery selectors. Example, if your DIV has id="text", then you would use `doc.DocumentNode.SelectNodes("//div[@id[starts-with(.,'text')");` I have updated my answer. – Tomislav Markovski Jul 19 '12 at 13:03
4

As I understood from your question, you are only trying to parse the HTML data, and you don't need to show the actual web page. If that is the case than you can take a very simple approach and use HttpWebRequest:

    var _plainText = string.Empty;
    var _request = (HttpWebRequest)WebRequest.Create("http://www.google.com");
    _request.Timeout = 5000;
    _request.Method = "GET";
    _request.ContentType = "text/plain";
    using (var _webResponse = (HttpWebResponse)_request.GetResponse())
    {
        var _webResponseStatus = _webResponse.StatusCode;
        var _stream = _webResponse.GetResponseStream();
        using (var _streamReader = new StreamReader(_stream))
        {
            _plainText = _streamReader.ReadToEnd();
        }
    }
2

Try this:

dynamic doc = wbMain.Document;
var htmlText = doc.documentElement.InnerHtml;

edit: Taken from here.

Community
  • 1
  • 1
Darajan
  • 868
  • 1
  • 9
  • 23
  • 3
    Copy of [this answer](http://stackoverflow.com/a/5762063/1136211). You should better refer to an existing answer. The question answered there may also be helpful to the asker here. – Clemens Jul 19 '12 at 11:50
  • @Clemens Thanks for the heads up. Not new to the site but new to actually trying to contribute. – Darajan Jul 19 '12 at 11:54
  • @Tomislav Markowski Whenever i try to access node i get an exception Expression must evaluate to a node-set. Can u tell me examples of how to use nodes – user1528573 Jul 19 '12 at 12:26