I am working on a simple php page which does this:
- Takes search string from url querystring (e.g. police officer)
- Appends the search string to a wikipedia search url (`https://en.wikipedia.org/w/index.php?search=police+officer')
- Use curl to get the final redirected URL for that search string
- Check if the redirected URL contains
index.php?search
- if it does, do nothing - Otherwise, explode the redirected url and get the last value from the url (
Police_officer
) - Append that value to Wikipedia URL which returns JSON data for that wiki record (
https://en.wikipedia.org/api/rest_v1/page/summary/Police_officer
) - Use
file_get_contents()
to read the JSON data and get data back - e.g.title
For some reason, on this line of code:
$json = file_get_contents($url_json);
Where $url_json
https://en.wikipedia.org/api/rest_v1/page/summary/Santa_claus
I get this error:
Warning: file_get_contents(https://en.wikipedia.org/api/rest_v1/page/summary/Santa_claus): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\xampp\public_html\test.php on line 49
Yet I can go to that URL in the browser and see just the same type of data as I can for this URL:
https://en.wikipedia.org/api/rest_v1/page/summary/Police_officer
And for that one, file_get_contents
returns the data just fine.
I used this code:
function get_http_response_code($url) {
$headers = get_headers($url);
return substr($headers[0], 9, 3);
}
To confirm that the response code for both pages = 200.
This is my basic test code:
$var = $_GET['var'];
$var = str_replace(" ", "+", $var);
$url1 = "https://en.wikipedia.org/w/index.php?search=$var";
echo "<hr /> url1: $url1 <hr />";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url1);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$a = curl_exec($ch);
$redirected_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo "<hr /> url2: $redirected_url <hr />";
$url_search = strpos($redirected_url, "index.php?search");
echo "<hr /> url_search: $url_search <hr />";
function get_http_response_code($url) {
$headers = get_headers($url);
return substr($headers[0], 9, 3);
}
$url_response = get_http_response_code($redirected_url);
echo "<hr /> url_response: $url_response <hr />";
if ($url_search > 0) {
// do nothing
} else {
$tmp = explode('/', $redirected_url);
$end = end($tmp);
$url_json = "https://en.wikipedia.org/api/rest_v1/page/summary/$end";
echo "<hr /> url_json: $url_json <hr />";
$json = file_get_contents($url_json);
if ($json) {
$data = json_decode($json, TRUE);
if ($data) {
$wiki_page = $data['content_urls']['desktop']['page'];
echo "<hr /> wiki_page: $wiki_page <hr />";
}
}
}
What have I missed?