Download an Entire Website in C#

Question

Forgive my ignorance on the subject

I am using

 string p="http://" + Textbox2.text;
 string r= textBox3.Text;
 System.Net.WebClient webclient=new
 System.Net.Webclient();
 webclient.DownloadFile(p,r);

to download a webpage. Can you please help me with enhancing the code so that it downloads the entire website. Tried using HTML Screen Scraping but it returns me only the href links of the index.html files. How do i proceed ahead

Thanks

Did you get your problem resolved? – Jason Jan 22 '10 at 21:19 — Jason, Jan 22 '10 at 21:19

score 10 · Answer 1 · answered Jan 19 '10 at 07:21

10

Scraping a website is actually a lot of work, with a lot of corner cases.

Invoke wget instead. The manual explains how to use the "recursive retrieval" options.

answered Jan 19 '10 at 07:21

Will

73,905
40
169
246

Jason · Answer 2 · 2010-01-19T08:46:10.463

 protected string GetWebString(string url)
    {
        string appURL = url;
        HttpWebRequest wrWebRequest = WebRequest.Create(appURL) as HttpWebRequest;
        HttpWebResponse hwrWebResponse = (HttpWebResponse)wrWebRequest.GetResponse();

        StreamReader srResponseReader = new StreamReader(hwrWebResponse.GetResponseStream());
        string strResponseData = srResponseReader.ReadToEnd();
        srResponseReader.Close();
        return strResponseData;
    }

This puts the webpage into a string from the supplied URL.

You can then use REGEX to parse through the string.

This little piece gets specific links out of craigslist and adds them to an arraylist...Modify to your purpose.

 protected ArrayList GetListings(int pages)
    {
            ArrayList list = new ArrayList();
            string page = GetWebString("http://albany.craigslist.org/bik/");

            MatchCollection listingMatches = Regex.Matches(page, "(<p><a href=\")(?<LINK>/.+/.+[.]html)(\">)(?<TITLE>.*)(-</a>)");
            foreach (Match m in listingMatches)
            {
                list.Add("http://albany.craigslist.org" + m.Groups["LINK"].Value.ToString());
            }
            return list;
    }

+1, also remember to parse all text files (html, css) as them can have links to other resources — Rubens Farias, Jan 19 '10 at 08:41

Download an Entire Website in C#

2 Answers2

Linked