0

I have my company's site which works on intranet, Only company's employee can view this site. Now I need to write a code through which I can get all the links from one particular page of our website. Below code isn't working in my case.

using (var client = new System.Net.WebClient())
            {
                client.Headers.Add("X-FORMS_BASED_AUTH_ACCEPTED", "f");
                client.Headers.Add("User-Agent: Other");
                client.Headers.Add(System.Net.HttpRequestHeader.Cookie, "security=true");
                client.Credentials = CredentialCache.DefaultCredentials;

                string pattern = @"(<a.*?>.*?</a>)";

                MatchCollection hreflist;

                string Url = client.DownloadString("https://collaborate.citi.net/docs/DOC-908807");

                //Getting the count for the number of links in oldUrl

                hreflist = Regex.Matches(Url, pattern);

I am struggling with this line:

string Url = client.DownloadString("https://collaborate.citi.net/docs/DOC-908807");

Is there any better way to achieve this? Please i am struggling with this and your small help or suggestion can help me in many ways

  • 2
    Unrelated, but: _"**Important** - We don't recommend that you use the WebClient class for new development. Instead, use the System.Net.Http.HttpClient class."_ - [Remarks on WebClient](https://learn.microsoft.com/en-us/dotnet/api/system.net.webclient?view=net-5.0#remarks) – Fildor Jul 15 '21 at 11:22
  • 1
    Do not use Regex to process HTML. (Not linking the infamous Answer ... ;P) Use one of the various HTML parsers out there. – Fildor Jul 15 '21 at 11:24
  • @Fildor: Do you have any examples or code on System.Net.Http.HttpClient class as i am all new to this. Please – Tausif Khan Jul 15 '21 at 11:25
  • See https://stackoverflow.com/a/41778175/982149 – Fildor Jul 15 '21 at 11:28
  • And also https://stackoverflow.com/a/2248422/982149 - which probably solves both steps in one. – Fildor Jul 15 '21 at 11:30
  • 1
    Does this answer your question? [Get all links on html page?](https://stackoverflow.com/questions/2248411/get-all-links-on-html-page) – Fildor Jul 15 '21 at 11:31
  • @Fildor: i am getting 401 Unauthorize for the first example, i mean obviously i'll have to pas my user name and password as credentials t access this site, but i don't know how to pas my credentials with HTTPClient. – Tausif Khan Jul 15 '21 at 12:01
  • Yes, you can add headers to your request. There are several ways to do this: On a "default" level or "per request". See answers to this question: https://stackoverflow.com/q/12022965/982149 – Fildor Jul 15 '21 at 12:09

0 Answers0