1

I'm trying to get wikipedia pages (from particular category) using of MediaWiki. For this I'm following this tutorial Listing 3. Listing pages within a category. My question is: How to get Wikipedia pages without using of Zend Framework? And is there any Rest Clients based on php without need to install? Because Zend requires to install their package first and some configurations... and I don't want to do all this stuff.

After googling and some investigation I have found a tool called cURL, using of cURL with PHP can also buid a rest service. I really new in implementing rest services, but already tried to implement something in php:

<?php
    header('Content-type: application/xml; charset=utf-8');

    function curl($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
    }
    $wiki = "http://de.wikipedia.org/w/api.php?action=query&list=allcategories&acprop=size&acprefix=haut&format=xml";
    $result = curl($wiki);
    var_dump($result);
?>

But got the errors in the result. Could anyone to help with this?

UPDATE:

This page contains the following errors:
error on line 1 at column 1: Document is empty
Below is a rendering of the page up to the first error.
bofanda
  • 10,386
  • 8
  • 34
  • 57

1 Answers1

0

Sorry for taking so long to reply, but better late than never...

When I run your code on the command line, the output I get is:

string(120) "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.
"

So it seems the problem is that you're bumping into Wikimedia bot User-Agent policy by not telling cURL to send a custom User-Agent header. To fix this, follow the advice given at the bottom of that page and add lines like the following into your script (alongside the other curl_setopt() calls):

$agent = 'ProgramName/1.0 (http://example.com/program; your_email@example.com)';
curl_setopt($ch, CURLOPT_USERAGENT, $agent);

Ps. You probably also don't want to set an application/xml content type unless you're sure that the content actually is valid XML. In particular, the output of var_dump() will not be valid XML, even if the input is.

For testing and development, I'd suggest either running PHP from the command line or using the text/plain content type. Or, if you prefer, use text/html and encode your output with htmlspecialchars().


Ps. Made this a community wiki answer, since I realized that this question has already been asked and answered before.

Community
  • 1
  • 1
Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153