2

I'm probably not supposed to use file_get_contents() What should I use? I'd like to keep it simple.

Warning: file_get_contents(http://en.wikipedia.org/w/api.php?action=query&titles=Your_Highness&prop=revisions&rvprop=content&rvsection=0): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden

TimWolla
  • 31,849
  • 8
  • 63
  • 96
Curtis
  • 2,486
  • 5
  • 40
  • 44
  • http://www.php.net/manual/en/book.curl.php – Charles Sprayberry Jan 21 '12 at 20:22
  • You can do it with file_get_contents, but if you like it simple, use cURL instead, because you need to handle cookies as described in the [API documentation](http://www.mediawiki.org/wiki/API:Login). Both the PHP native and the cURL ways will let you, but it's much simpler in cURL. – netcoder Jan 21 '12 at 20:33
  • Better than cURL is HTTP_Request2 you can download it via Pear it's a much nicer experience as it is a proper OO library – Adam Pointer Jan 21 '12 at 20:42
  • @netcoder, you need to handle cookies only if you want to log in. It's no necessary only for accessing it. – svick Jan 22 '12 at 13:31

4 Answers4

13

The problem you are running into here is related to the MW API's User-Agent policy - you must supply a User-Agent header, and that header must supply some means of contacting you.

You can do this with file_get_contents() with a stream context:

$opts = array('http' =>
  array(
    'user_agent' => 'MyBot/1.0 (http://www.mysite.com/)'
  )
);
$context = stream_context_create($opts);

$url = 'http://en.wikipedia.org/w/api.php?action=query&titles=Your_Highness&prop=revisions&rvprop=content&rvsection=0';
var_dump(file_get_contents($url, FALSE, $context));

Having said that, it might be considered more "standard" to use cURL, and this will certainly give you more control:

$url = 'http://en.wikipedia.org/w/api.php?action=query&titles=Your_Highness&prop=revisions&rvprop=content&rvsection=0';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, 'MyBot/1.0 (http://www.mysite.com/)');

$result = curl_exec($ch);

if (!$result) {
  exit('cURL Error: '.curl_error($ch));
}

var_dump($result);
DaveRandom
  • 87,921
  • 11
  • 154
  • 174
1

file_get_contents Should work.

file_get_contents('http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=timestamp|user|comment|content')

This was previously discussed on stackoverflow here

Also, some nice looking code samples here

Good Muyis
  • 127
  • 9
jon
  • 5,986
  • 5
  • 28
  • 35
  • Well it would be nice if file_get_contents just worked, it seems to be more complicated than I thought – Curtis Jan 21 '12 at 22:19
1

The error message you are really receiving is

Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.

This means that you should provide additional details about yourself when using the API. Your usage of file_get_contents does send the required User-Agent.

Here is a working example in curl that identifies itself as a Test for this question:

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://en.wikipedia.org/w/api.php?action=query&titles=Your_Highness&prop=revisions&rvprop=content&rvsection=0&format=xml");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, "Testing for http://stackoverflow.com/questions/8956331/how-to-get-results-from-the-wikipedia-api-with-php");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);

echo $result;
?>
favo
  • 5,426
  • 9
  • 42
  • 61
0

They themselves say in their API documentation:

Use any programming language to make an HTTP GET request for that URL

You need to get the URL right, thefollowing worksfor me : http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Main%20Page&prop=revisions&rvprop=content

you are not specifying the output format as far as I can notice right now!

Community
  • 1
  • 1
whizzzkid
  • 1,174
  • 12
  • 30
  • http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Your_Highness&prop=revisions&rvprop=content&rvsection=0 Your URL works after adding output format... – whizzzkid Jan 21 '12 at 20:37