I have a task: get by inputed keyword Wikipedia article, save it to database and then make a search inside them.
The problem is: how to access api and retrieve data from wikipedia, I've tried this url (at the begining i've tried json format):
$url = 'https://en.wikipedia.org/w/api.php?action=query&titles=Dog&prop=revisions&rvprop=content&format=xml';
and this php code:
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
$res = curl_exec($ch);
if (!$res) {
echo 'cURL Error: '.curl_error($ch);
}
var_dump($res);
but nothing happend. Is it possible to access data with curl?
At the end one code worked with url above:
ini_set('user_agent','TestText');
$xmlDoc = new \DOMDocument();
$xmlDoc->load($url);
echo($xmlDoc->saveXML());
and then I get the text like this
{{about|the domestic dog|related species known as "dogs"|Canidae|other uses|Dog (disambiguation)|}} {{Redirect|Doggie|the Danish artist|Doggie (artist)}} {{pp-semi-indef}} {{pp-move-indef}} {{Taxobox | name = Domestic dog | fossil_range = {{Fossil range|0.033|0}}[[Pleistocene]] – [[Recent]] |
How can I handle it to be prettier (text with paragraphes or at liest plain text)?
So, There are two questions: 1. Is it possible to access wiki data with php curl and how I should improve my code? 2. How do I make wiki xml code prettier?
My question about code, especially about curl. Why it doesn't work? And also, answer to another question says only about wikipedia api urls. By only changing url I can't solve problem.
I've found the solution, CURLOPT_SSL_VERIFYPEER was needed:
$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&explaintext=&titles=Dog';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
$res = curl_exec($ch);
//$json_data = mb_substr($res, curl_getinfo($ch, CURLINFO_HEADER_SIZE));
curl_close($ch);
$json = json_decode($res);
$content = $json->query->pages;
$wiki_id = '';
foreach ($content as $key => $value) {
$wiki_id = $key;
}
echo $content = $content->$wiki_id->extract;
302 Moved
The document has moved here. – GingerN Oct 19 '15 at 14:35