0

I am trying to make a API call to wikipedia through: http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=xml, but the xml is full with html and css tags.

Is there a way to fetch only plain text without tags? Thanks!

*Edit 1:

$json = json_decode(file_get_contents('http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json'));
$txt  = strip_tags($json->text);
var_dump($json);

Null displayed.

croppio.com
  • 1,823
  • 5
  • 28
  • 44

2 Answers2

1

Question was partially answered here

$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json&prop=text';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server
$c = curl_exec($ch);

$json = json_decode($c);

var_dump(strip_tags($json->{'parse'}->{'text'}->{'*'}))

I was not able to use file_get_contents but it works fine with cURL.

Community
  • 1
  • 1
Vaillancourt
  • 1,380
  • 1
  • 11
  • 42
  • I just want to know one more thing, are there any terms and conditions for using wiki content, like is it paid or I need to show the content with a tag saying "Content from Wikipedia" ??? or any other special permissions – Deepanshu Goyal Dec 06 '13 at 06:12
  • 1
    You should take a look at the [license](https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License) at Wikipedia (and at any other wikis you use). It says you have to attribute the work, thus you have to say that content is from Wikipedia. I would advise against copying and not specifying your sources anyways, though (your reputation is on the line). The use of the content by bots/cURL is not forbidden, though (as long as you respect the terms), but if you hammer the site they may want to ban your IP from using their API. Wikipedia is free. – Vaillancourt Dec 06 '13 at 13:42
0

it is possible to fetch info or description from wikipedia by using xml.

       $url = "http://en.wikipedia.org/w/api.php?action=opensearch&search=".$term."&format=xml&limit=1";
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
        curl_setopt($ch, CURLOPT_POST, FALSE);
        curl_setopt($ch, CURLOPT_HEADER, false);   // Include head as needed
        curl_setopt($ch, CURLOPT_NOBODY, FALSE);        // Return body
        curl_setopt($ch, CURLOPT_VERBOSE, FALSE);           // Minimize logs
        curl_setopt($ch, CURLOPT_REFERER, "");            // Referer value
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);    // No certificate
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);     // Follow redirects
        curl_setopt($ch, CURLOPT_MAXREDIRS, 4);             // Limit redirections to four
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);     // Return in string
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8");   // Webbot name
        $page = curl_exec($ch);
        $xml = simplexml_load_string($page);
        if((string)$xml->Section->Item->Description) {
            print_r(array((string)$xml->Section->Item->Text, 
            (string)$xml->Section->Item->Description, 
            (string)$xml->Section->Item->Url));
        } else {
            echo "sorry";
        } 

But curl must be install on server... have a nice day...