Fetch the description from wikipedia from an article

Question

I am trying to make a API call to wikipedia through: http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=xml, but the xml is full with html and css tags.

Is there a way to fetch only plain text without tags? Thanks!

*Edit 1:

$json = json_decode(file_get_contents('http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json'));
$txt  = strip_tags($json->text);
var_dump($json);

Null displayed.

Are sure there is no error return? (I getting a 403 if using command to grab content, it seems require an authentication key) — ajreal, Dec 16 '11 at 11:57
yes, you are right (my php.ini was forced to not display erros); how can i get this key? — croppio.com, Dec 16 '11 at 12:28
@mjonutz docs here -> http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval and info on the whole process here -> — Manse, Dec 16 '11 at 14:05

score 1 · Accepted Answer · edited May 23 '17 at 12:03

1

Question was partially answered here

$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json&prop=text';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server
$c = curl_exec($ch);

$json = json_decode($c);

var_dump(strip_tags($json->{'parse'}->{'text'}->{'*'}))

I was not able to use file_get_contents but it works fine with cURL.

edited May 23 '17 at 12:03

Community

1
1

answered Jan 11 '12 at 04:44

Vaillancourt

1,380
1
11
42

I just want to know one more thing, are there any terms and conditions for using wiki content, like is it paid or I need to show the content with a tag saying "Content from Wikipedia" ??? or any other special permissions – Deepanshu Goyal Dec 06 '13 at 06:12
1

You should take a look at the [license](https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License) at Wikipedia (and at any other wikis you use). It says you have to attribute the work, thus you have to say that content is from Wikipedia. I would advise against copying and not specifying your sources anyways, though (your reputation is on the line). The use of the content by bots/cURL is not forbidden, though (as long as you respect the terms), but if you hammer the site they may want to ban your IP from using their API. Wikipedia is free. – Vaillancourt Dec 06 '13 at 13:42

score 0 · Answer 2 · answered Aug 15 '12 at 08:03

it is possible to fetch info or description from wikipedia by using xml.

       $url = "http://en.wikipedia.org/w/api.php?action=opensearch&search=".$term."&format=xml&limit=1";
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
        curl_setopt($ch, CURLOPT_POST, FALSE);
        curl_setopt($ch, CURLOPT_HEADER, false);   // Include head as needed
        curl_setopt($ch, CURLOPT_NOBODY, FALSE);        // Return body
        curl_setopt($ch, CURLOPT_VERBOSE, FALSE);           // Minimize logs
        curl_setopt($ch, CURLOPT_REFERER, "");            // Referer value
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);    // No certificate
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);     // Follow redirects
        curl_setopt($ch, CURLOPT_MAXREDIRS, 4);             // Limit redirections to four
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);     // Return in string
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8");   // Webbot name
        $page = curl_exec($ch);
        $xml = simplexml_load_string($page);
        if((string)$xml->Section->Item->Description) {
            print_r(array((string)$xml->Section->Item->Text, 
            (string)$xml->Section->Item->Description, 
            (string)$xml->Section->Item->Url));
        } else {
            echo "sorry";
        }

But curl must be install on server... have a nice day...

Fetch the description from wikipedia from an article

2 Answers2

Linked