1
    $url = "https://www.google.pl/search?q=agawa+korzenie&oq=agawa+korzenie";

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"); //   "Content-type: text/html; charset=UTF-8"
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    //curl_setopt( $ch, CURLOPT_ENCODING, "" );

    curl_setopt($ch, CURLOPT_HTTPHEADER , array(
     //'Content-Type: application/x-www-form-urlencoded; charset=utf-8',
     'Content-Type: text/html; charset=utf-8',
    ));
   $icerik = curl_exec($ch);
   curl_close($ch);
   echo $icerik;

The encoding of response is invalid. Characters such as ś get translated to ? sign. How can I overcome that issue?

Sargit
  • 11
  • 4

1 Answers1

0

if what you say is really true, then it's a problem with the server, not with curl. but most likely, it's not a problem with the server either, it's probably a problem with how you view the result. here are my theories, ranging from most likely, to least likely:

1: you view the result in a web browser, you are not supplying the encoding parameter in the Content-Type: header, and the browser identify the content as HTML4, where the default charset is ISO-8859-1, and thus renders it as ISO-8859-1, which doesn't support ś, and the browser turns the unrenderable characters into ?. the fix is to change the Content-Type header into Content-Type: text/html;charset=utf8

2: same as above, but your server is actually supplying the wrong content-type header, eg Content-Type: text/html;charset=ISO-8859-1, the fix is the same as above.

3: the server is storing data in a sql db (like mysql) with the saving charset set to ISO-8859-1 (or something close to it), and then the db replace invalid characters with ? (i've seen this many times in the past, but not in recent years), in which case the server code must be fixed. check this answer https://stackoverflow.com/a/279279/1067003

4: you run PHP in a terminal which doesn't support unicode characters. the solution is to switch to a better terminal. (not very likely, but hey, xterm is still around, and still has a no-unicode version, you could be using normal xterm)

5: the server really is running some version of $response=str_replace($response,'ś','?');echo $response; ... highly unlikely, but not impossible, which must also be fixed on the server side. check this answer https://stackoverflow.com/a/279279/1067003

lastly, protip, you're confused, CURLOPT_HTTPHEADER is headers curl send to the target url in the request, when you set Content-Type with CURLOPT_HTTPHEADER, you set the Content-Type for the request body of the curl request. but because you're not using CURLOPT_INFILE, nor are you using CURLOPT_POSTFIELDS, there is no request body at all, and thus there shouldn't be any content-type header in the request, get rid of it. you were probably looking for the header() function, eg header('Content-Type: text/html; charset=utf-8');, which will send that header to the browser.

hanshenrik
  • 19,904
  • 4
  • 43
  • 89