Wikipedia Api connect via php curl

Question

I have a task: get by inputed keyword Wikipedia article, save it to database and then make a search inside them.

The problem is: how to access api and retrieve data from wikipedia, I've tried this url (at the begining i've tried json format):

$url = 'https://en.wikipedia.org/w/api.php?action=query&titles=Dog&prop=revisions&rvprop=content&format=xml';

and this php code:

$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); 
$res = curl_exec($ch);
if (!$res) {
    echo 'cURL Error: '.curl_error($ch);
}
var_dump($res);

but nothing happend. Is it possible to access data with curl?

At the end one code worked with url above:

ini_set('user_agent','TestText');
$xmlDoc = new \DOMDocument();
$xmlDoc->load($url);
echo($xmlDoc->saveXML());

and then I get the text like this

{{about|the domestic dog|related species known as "dogs"|Canidae|other uses|Dog (disambiguation)|}} {{Redirect|Doggie|the Danish artist|Doggie (artist)}} {{pp-semi-indef}} {{pp-move-indef}} {{Taxobox | name = Domestic dog | fossil_range = {{Fossil range|0.033|0}}[[Pleistocene]] – [[Recent]] |

How can I handle it to be prettier (text with paragraphes or at liest plain text)?

So, There are two questions: 1. Is it possible to access wiki data with php curl and how I should improve my code? 2. How do I make wiki xml code prettier?

My question about code, especially about curl. Why it doesn't work? And also, answer to another question says only about wikipedia api urls. By only changing url I can't solve problem.

I've found the solution, CURLOPT_SSL_VERIFYPEER was needed:

$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&explaintext=&titles=Dog';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); 
$res = curl_exec($ch);
//$json_data = mb_substr($res, curl_getinfo($ch, CURLINFO_HEADER_SIZE));
curl_close($ch);
$json = json_decode($res);

$content = $json->query->pages;
$wiki_id = '';
foreach ($content as $key => $value) {
    $wiki_id = $key;
}
echo $content = $content->$wiki_id->extract;

To get the rendered HTML of a wiki page, you can just append `action=render` to the URL, like this: https://sv.wikipedia.org/wiki/Portal:Huvudsida?action=render — leo, Oct 17 '15 at 20:09
Possible duplicate of [Get Text Content from mediawiki page via API](http://stackoverflow.com/questions/1625162/get-text-content-from-mediawiki-page-via-api) — leo, Oct 17 '15 at 20:10
also, flagging as duplicate, as there are already tens of questions about providing the same answers — leo, Oct 17 '15 at 20:11
But curl seem to be working fine for you already? You seem to be getting the wiki text back just as you should, so you just have to replace the URL! — leo, Oct 18 '15 at 08:28
I wrote that nothing happend with curl. Only \DOMDocument did something. — GingerN, Oct 18 '15 at 08:32
(your code works for me. I suspect you might not have the PHP/curl binding installed, but it's impossible to say without knowing what error messages you get) — leo, Oct 18 '15 at 08:43
I don't get any error messages, nothing happened. And I've checked: curl is set. — GingerN, Oct 18 '15 at 10:33
Do any errors at all get written to your log file? What [error level](http://php.net/manual/en/errorfunc.configuration.php) do you use? Note that error messages are not written to the screen, you have to look for them in your error.log file or similar. — leo, Oct 18 '15 at 16:49
Also, does curl work for other URL's? If not, can you run curl off the command line? — leo, Oct 18 '15 at 16:50
I'm just confused. There are no errors in log file too. Below is what it gets with google.com url. Should it be like that? 302 Moved
302 Moved
The document has moved here. — GingerN, Oct 19 '15 at 14:35
Then I'm out of ideas. Your code works just fine for me, and it appears to be working ok for you with another url. How about fetching the Wikipedia url with curl from the command line, what happens then? — leo, Oct 19 '15 at 17:42
Nothing. Also I've tried variasions of plain wikipedia.org and nothing happend again. I'm new to curl and with this strange behaviur behan to doubt what I'm doing right or wrong. I think it's only left to try curl with other sites... — GingerN, Oct 19 '15 at 18:27
Solution as often in those strange cases was simple: CURLOPT_SSL_VERIFYPEER needed. — GingerN, Oct 20 '15 at 12:29

Wikipedia Api connect via php curl

302 Moved

0 Answers0