0

I am very new to coding and the code client.DownloadString works on most sites but I have come across a site that seems to block the attempt to load the webpage and returns the same response of blocking the attempt. I was hoping someone can tell me a way around here is the code and the page that won't load

  ServicePointManager.Expect100Continue = True
  ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
  Dim client = New WebClient()
  TextBox1.Text = client.DownloadString("https://www.somesite.com/samsung-series-9-7-1-4-channel.html")

Any tips would be appreciated

  • `DownloadString` performs a simple HTTP GET. It's the remote site itself that decided it doesn't like you, probably because you hammered it with too many requests, or because your IP appears too many times in their logs without any obvious reason (ie purchase). Server owners *really don't like* to pay for servers and power just to serve crawlers, especially when the crawler is used by a competitor. For them, your code *is* malware. – Panagiotis Kanavos May 09 '19 at 10:51
  • What response do you get? What status code? A 429 response means that you sent too many requests and just need to slow down. It doesn't mean you got block. You just have to throttle your requests. A different status code though, like a 400 or 404 or even a timeout may mean you got blocked. In that case you won't be able to make requests again until they remove the block – Panagiotis Kanavos May 09 '19 at 10:57
  • Thanks Panagiotis, not a competitor, trying to automate a tool for one of their sub-divisions and the code stopped working recently. I can access their pages from Chrome so IP is not being blocked. It detects a ROBOT crawl behaviour (which is the case) and blocks it .. the response is – newbie2019 May 09 '19 at 11:43
  • 1
    That's not a rejection, that's a response specifically created for crawlers. I suspect the server saw there's no `User-Agent` header in the request, realized the request came from a bot and sent back the canned response. Add a `user-agent` header [as shown here](https://stackoverflow.com/questions/11841540/setting-the-user-agent-header-for-a-webclient-request). – Panagiotis Kanavos May 09 '19 at 12:27
  • Different servers send different `user-agent` strings. You can google for a list or you can use `Fiddler` or your browser's Developer Tools to inspect requests in the `Network` tab and see what it sends – Panagiotis Kanavos May 09 '19 at 12:30
  • They *really* hate bots. I tried sending a User Agent header a couple of times and now my *browser* displays a verification page! Just ask them what you should do to avoid rejections. They may be able to just whitelist your IP and be done with it – Panagiotis Kanavos May 09 '19 at 12:37
  • 1
    Thank you so much Panagiotis, that worked! Legend! – newbie2019 May 09 '19 at 22:02

0 Answers0