WebClient forbids opening wikipedia page?

Question

Here's the code I'm trying to run:

var wc = new WebClient();
var stream = wc.OpenRead(
             "http://en.wikipedia.org/wiki/List_of_communities_in_New_Brunswick");

But I keep getting a 403 forbidden error. Don't understand why. It worked fine for other pages. I can open the page fine in my browser. How can I fix this?

score 9 · Accepted Answer · edited Dec 19 '14 at 19:23

9

I wouldn't normally use OpenRead(), try DownloadData() or DownloadString() instead.

Also it might be that wikipedia is deliberately blocking your request because you have not provided a user agent string:

WebClient client = new WebClient();
client.Headers.Add("user-agent", 
    "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");

I use WebClient quite often, and learned quite quickly that websites can and will block your request if you don't provide a user agent string that matches a known web browser. Also, if you make up your own user agent string (eg "my super cool web scraper") you will also be blocked.

[Edit]

I changed my example user agent string to that of a modern version of Firefox. The original example I gave was the user agent string for IE6 which is not a good idea. Why? Some websites may perform filtering based on IE6 and send anyone with that browser a message or to a different page that says "Please update your browser" - this means you will not get the content you wanted to get.

edited Dec 19 '14 at 19:23

Igor Ševo

5,459
3
35
80

answered Jan 27 '11 at 01:04

JK.

21,477
35
135
214

1

Using `DownloadString()` in place of `OpenRead()` will work with or without specifying the user agent. If you prefer to use `OpenRead()` for whatever reason, appending the user agent string to the headers *does* work. – Nathan Taylor Jan 27 '11 at 01:07
Just odd because it was working fine for some of the other provinces...adding the user-agent string did fix it for NB though. Thanks! Why should `DownloadString` make any difference though? It connects in the same way, doesn't it? – mpen Jan 27 '11 at 01:10
1

@Nathan I would use a well known user agent anyway. What if a couple of months down the track the website changes and now rejects the empty user agent? Your code breaks without warning and it will be very hard to find the problem. But you can be fairly certain that they will not suddenly start rejecting a known user agent. – JK. Jan 27 '11 at 05:40
Not going to argue with that. – Nathan Taylor Jan 27 '11 at 05:49

WebClient forbids opening wikipedia page?

1 Answers1

Linked