1

Is it possible to webscrape using HTTPWebRequest or webclient and display only a specfic div or table like the example below?

This is one div from a page of other divs, just to give you a structure example.

     <div id="DIV5">

            <table cellspacing="0" cellpadding="0"><tbody>

                <tr class="">

                <tr class="last">

            </table>

</div>

I have this simple code which displays the HTML from the page, but I am looking for a way to display only one DIV or One table.

namespace SimpleScreenScrape
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            string html = this.GetWebsiteHtml(this.textBox1.Text);
            this.richTextBox1.Text = html;
        }

        private string GetWebsiteHtml(string url)
        {
            WebRequest request = WebRequest.Create(url);
            WebResponse response = request.GetResponse();
            Stream stream = response.GetResponseStream();
            StreamReader reader = new StreamReader(stream);
            string result = reader.ReadToEnd();
            stream.Dispose();
            reader.Dispose();
            return result;
        }




    }
}
arunkumar
  • 32,803
  • 4
  • 32
  • 47
Rhys
  • 2,807
  • 8
  • 46
  • 68
  • Take a look at http://htmlagilitypack.codeplex.com/ , using which you can select the nodes that you want to display. – Chandu Aug 31 '11 at 20:20

1 Answers1

1

Generally speaking, once you have the HTML document (stored in your result variable), you can parse it and display only the parts of it that you want.

I suggest you use a dedicated HTML parser such as the HTML Agility Pack - this will allow you to easily extract only the HTML you are interested in.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • This is the way to go! Don't try to parse your HTML with a regex! You'll just end up crafting something unholy and unmaintainable that will fail when dealing with malformed HTML. – CalebD Aug 31 '11 at 20:31
  • 1
    @calebd see [this answer](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) for exactly why you shouldn't use regex to parse HTML ;) – Jason Aug 31 '11 at 20:33
  • Cheers, just used HTML Agility, what a great tool. Now I have the problem of converting this back into a table?Sep07:0007:00DEPARTEDUS5339NZ303Christchurch01 Sep07:0007:00DEPARTEDNZ101 Sydney01 Sep07:0007:00DEPARTEDAC6105NZ101Sydney01 Sep07:0007:00 CA5111NZ101Sydney01 Sep07:0007:00 CO6771NZ101Sydney01 Sep07:0007:00DEPARTEDDJ8001NZ101Sydney01 – Rhys Aug 31 '11 at 20:34
  • @Rhys - Comments are not a good place for code samples. I suggest asking a new question... – Oded Aug 31 '11 at 20:36