php and MediaWiki

Question

I am looking to retrieve the XML of a Wikipedia page using their api. The URL I'm using is the following: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=dog

I've seen this, but it hasn't helped. No matter what I do, I'm not actually getting anything returned to $c, and I can't figure out why. I can do file_get_contents with a plain text file, and it works just fine. Can anyone else verify that this works?

<?php
$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$c = file_get_contents($url);
echo $c;
?>

EDIT I have also tried the cURL available on that page, which also doesn't work:

$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$c = curl_exec($ch);
echo $c;

perhaps urls are disabled on file_get_contents by your hosting company, have you tried curl instead? — Twelve47, Apr 12 '11 at 15:30
`Warning: file_get_contents(http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in [file]` — Karl Andrew, Apr 12 '11 at 15:33
I've tried the curl too, which I also couldn't get to work. I've posted it above for reference. — cryptic_star, Apr 12 '11 at 15:36

Twelve47 · Accepted Answer · 2011-04-12T21:05:27.687

4

wikipedia requires you specify a descriptive user agent, by doing something like this:

<?php
$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&redirects&titles=Main%20Page';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_USERAGENT, "MyCoolTool (+http://example.com/MyCoolToolPage/)");
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$c = curl_exec($ch);
echo $c;
?>

You should use a user-agent string that describes your site, and you shouldn't spoof a web browser's user agent or you may be blocked for appearing suspicious (source: WikiMedia User-Agent policy)

edited Apr 12 '11 at 21:05

answered Apr 12 '11 at 15:43

Twelve47

3,924
3
22
29

1

Do not use a browser user agent, or you are liable to get your IP address banned by the sysadmins. Use something that identifies your program and contains your email or website address. See [Wikimedia's User-Agent policy](http://meta.wikimedia.org/wiki/User-Agent_policy) for details. – Anomie Apr 12 '11 at 16:40
1

@Anomie, thanks. I've updated my answer to take that into account. – Twelve47 Apr 12 '11 at 21:06

php and MediaWiki

1 Answers1