0

Hi i am writing a HTML parser for helping with some job duties, I can enter the site using IE explorer. but using csharp code i get an error

i have tried using

client.Credentials = CredentialCache.DefaultNetworkCredentials;
client.Proxy.Credentials = CredentialCache.DefaultCredentials;

i don't get the requested page, but an error page. if i can view page in explorer there must be a way to retrieve its html in C#

(note that same page in other browsers requires authintication - not in IE)

dove
  • 20,469
  • 14
  • 82
  • 108
james
  • 1,758
  • 1
  • 16
  • 26
  • What is `client` - a WCF service client? – jrummell Nov 27 '12 at 15:32
  • Who is running the IE when it works, and what do thier IE security settings say about automatic logon? – Jodrell Nov 27 '12 at 15:34
  • @Jodrell same user as the one running the C# application. – james Nov 27 '12 at 15:36
  • IE allows the currently logged on user to automatically authenticate with other MS services (assuming that they share the same domain/AD) - other browsers request the login details via a 403. Look at changing your .Credentials to use the NetworkCredential class. (`client.Credentials = new NetworkCredential('username','password');` – Gavin Nov 27 '12 at 15:37
  • Hate to burst you bubble, but please look at http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c and `HtmlAgilityPack` in particular. If it can't do what you want (which I doubt) contribute a changeset. – Richard Schneider Nov 27 '12 at 15:38
  • 1
    @RichardSchneider i am not sure you understood my issue, the question you linked does not discuss authentication issue but parsing issue. i might of misunderstood you but it seems irrelevant to the question asked. – james Nov 27 '12 at 15:45
  • Read about `HtmlAgilityPack` it does most things well. See http://htmlagilitypack.codeplex.com/ – Richard Schneider Nov 27 '12 at 15:48
  • @james, your question is irrelecant. html agility pack does screen scraping and deals with authentication. All the code is already there, why would you want to re-invent it. You are just making your "job duties" harder. – Richard Schneider Dec 06 '12 at 17:09

2 Answers2

1

You could try this library: https://github.com/HtmlUnit/NHtmlUnit

You can use html unit to programmatically perform html operations. Further info can be found at http://blog.stevensanderson.com/2010/03/30/using-htmlunit-on-net-for-headless-browser-automation/

Daniel Lane
  • 2,575
  • 2
  • 18
  • 33
1

The issue was with the request header, C# sends no user-agnet header tag, and i guess the site i was trying to connect returns an error if encountered with no user agent.

added the following line to adjust header to my IE header :

WebClient client = new WebClient();
client.Credentials = CredentialCache.DefaultNetworkCredentials;
client.Proxy.Credentials = CredentialCache.DefaultCredentials;
client.Headers.Add ("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
james
  • 1,758
  • 1
  • 16
  • 26
  • Is `client` an instance of `WebClient`? We still don't know what you are doing ... – jrummell Nov 27 '12 at 18:03
  • @jrummell you are correct, added code. client is webClient. i was getting an error page from server trying to connect with my C# code the above fixed it. – james Nov 27 '12 at 18:50