1

I started off with the simple code below in order to grab the html from webpages into a string to later process. For some sites like Digikey it works but for others like Mouser it doesn't.

I have tried putting headers and userAgents onto the WebClient along with converting the url to a Uri with no success. Does anybody have any other suggestions of what I could try? Or could anybody try to get the code to work and let me know how it goes?

String url = "http://www.mouser.com/ProductDetail/Vishay-Thin-Film/PCNM2512E1000BST5/? 
qs=sGAEpiMZZMu61qfTUdNhG6MW4lgzyHBgo9k7HJ54G4u10PG6pMa7%252bA%3d%3d"    
WebClient web = new WebClient();
String html = web.DownloadString(url);
MessageBox.Show(html);

EDIT : The link should lead here: link

EDIT : I tried the following chunk of code with no luck:

String url = "http://www.mouser.com/ProductDetail/Vishay-Thin-Film/PCNM2512E1000BST5/? 
qs=sGAEpiMZZMu61qfTUdNhG6MW4lgzyHBgo9k7HJ54G4u10PG6pMa7%252bA%3d%3d"    
WebClient web = new WebClient();
web.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
String html = web.DownloadString(url);
MessageBox.Show(html);
Tyler Fontaine
  • 131
  • 2
  • 12
  • does that url work on the browser?. – NicoRiff Mar 10 '17 at 21:29
  • You get the mouser error/404 page. Not what is wanted i think but should get a string nonetheless – Chris Watts Mar 10 '17 at 21:38
  • It's weird, when I paste the link here it changes it to an error page. I'll try and figure out how to get it to post correct. – Tyler Fontaine Mar 10 '17 at 22:02
  • It's plausible that their web server is looking at the user agent, noticing that it isn't a browser, and refusing to serve up content. I'd pretend to be a real browser. Find out a real user agent, and use that to make the request. – Lynn Crumbling Mar 10 '17 at 22:04
  • The real link should lead here... [link] (http://www.mouser.com/ProductDetail/Vishay-Thin-Film/PCNM2512E1000BST5/?qs=sGAEpiMZZMu61qfTUdNhG6MW4lgzyHBgo9k7HJ54G4u10PG6pMa7%252bA%3d%3d) – Tyler Fontaine Mar 10 '17 at 22:04
  • See http://stackoverflow.com/a/11841680/656243 for setting a user agent. – Lynn Crumbling Mar 10 '17 at 22:05
  • I have already tried setting the user agent with no luck either, I actually tried doing it a few ways too. – Tyler Fontaine Mar 10 '17 at 22:07
  • I'm getting back a 405, method not allowed. – Lynn Crumbling Mar 10 '17 at 22:11
  • That's weird because I'm not receiving anything at all, not even a 405. Try going to the actual webpage link I posted and use that url, it seems the one posted in the code snip-it pastes weird. I noticed that when I use just [link] (http://www.mouser.com/) it works fine, but when I try using any other location on the site it fails. – Tyler Fontaine Mar 10 '17 at 22:12
  • They clearly have some sophisticated mechanism in place to prevent scraping. – Lynn Crumbling Mar 10 '17 at 22:12
  • Try hitting the URL with wget -- that's how I got the 405. – Lynn Crumbling Mar 10 '17 at 22:13

1 Answers1

0

Need to download Fiddler it's free (was originally developed by Microsoft) and it lets you record browser sessions. So launch it open chrome or whatever your browser is and go though the steps. Once you done you can stop it and look at every request and response and the raw data sent.

Makes it easy to spot the difference between your code and the browser.

There are also many free tools that will take your request/response data and generate the C# code for you such as Request To Code. That is not the only one, I'm not at work and I can't recall the one I use there, but there are plenty to choose from.

Hope this helps

Kelly
  • 6,992
  • 12
  • 59
  • 76