0

I wonder how I can manage a URL with special characters when working with cURL. In the code sample below I have implemented a function that returns the HTTP response code for a url

(NB. I know that there are other ways to get the response code, but this is just for illustrating the problem with special characters in the URL)

<?php
function get_http_code($url) {
    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_NOBODY, true);       
    curl_setopt($ch, CURLOPT_URL, $url);

    curl_exec($ch);
    return curl_getinfo($ch, CURLINFO_HTTP_CODE);
}

// urlA has a special character 'â' 
$urlA = "https://media.winefinder.com/upload/artiklar/bilder/KSU128248-1-Domaine-de-Ferrand-Châteauneuf-du.jpg";

// urlB has no special charactes
$urlB = "https://media.winefinder.com/upload/artiklar/bilder/A129749-1-Domaine-Lafage-Princesse-2019.png";

echo "<div>Code A = " . get_http_code($urlA) . "</div>";  // --> 404
echo "<div>Code B = " . get_http_code($urlB) . "</div>";  // --> 200
?>

Both urls exists, but the first one returns 404 (Not found) due to the special character in the url. How can I convert the URL to a encoded url that will return 200? I have tried with urlencode(), rawurlencode() etc, but I can't make it work.

PS. I have tried to find similar questions at SO, but couldn't find any that helped me here...

Gowire
  • 1,046
  • 6
  • 27
  • Does this answer your question? [Should I use accented characters in URLs?](https://stackoverflow.com/questions/1386262/should-i-use-accented-characters-in-urls) – mbesson May 25 '21 at 12:12
  • @mbesson No, not really... The URL:s are not "mine" so I cannot change them. But I want to be able to validate their existence. So, when the URL contains special characters it will return 404 even if the url exists – Gowire May 25 '21 at 12:17
  • Where do you get those URLs from? Because they're invalid… – deceze May 25 '21 at 12:18
  • @deceze The example URL comes from this `https://www.winefinder.com/vin/frankrike/chteauneuf-du-pape/chteauneuf-du-pape-rouge-domaine-de-ferrand-2018-ksu130713` The bottle image on that page has a special character in it... – Gowire May 25 '21 at 12:21
  • 1
    Then you have a bit of a problem. The URL is invalid as such. The browser is simply lenient enough to treat the URL correctly anyway, by implicitly encoding it when sending the actual request (you should see the encoded URL when inspecting the actual request in the network inspector). However, the browser can only *guess* at how the URL should be encoded, there's no correct answer. It must be encoded such that the web server will find the requested file; that's why the web server should also do the encoding in the first place. – deceze May 25 '21 at 12:24
  • 1
    You could go through the URL and encode *non-ASCII characters* yourself, doing the same guessing that the browser does (probably `rawurlencode` its UTF-8 representation). – deceze May 25 '21 at 12:25

0 Answers0