0

i have one special issue with file_get_contents in PHP, here is code:

$html = file_get_contents("http://www.bratm.sk/trochu-harlem-inspiracie-zaslite-nam-svoj-super-harlem-shake-ak-bude-husty-zverejnime-ho");

it gives me something like this

í}KsŰĆŇčÚŽň!Ç& zRŚ|iI[#)öIkI ĆĺăĹŮÝĹÍý_ŐŃâ[EVßîV%Ů˙ëvĎ ř)śG#óčééééÇĚ çáÜöÁÖÉ;¤áˇ,rřĂăçť[DQ5íeaKÓśOśÉ?ě='šL¸ÔöLßä6ľ4mg_!JĂ÷˘Śu:L§án];9ŇÎV+ި1|C

in some pages from this page i can get proper encoding and content with iconv, but here im helpless, how can i fix that? thx

PayteR
  • 1,727
  • 1
  • 19
  • 35

3 Answers3

2

That page is in UTF-8. You need to set the header to match:

header('Content-Type: text/html; charset=utf-8');

Matt
  • 5,315
  • 1
  • 30
  • 57
  • thx, but its not help... and anyway, i dont need to display it, i need to crawl in this code – PayteR Mar 08 '13 at 23:29
  • What exactly are you doing with it? Storing it the DB? Looking for something within it? Did you see: http://stackoverflow.com/questions/2236668/file-get-contents-breaks-up-utf-8-characters – Matt Mar 08 '13 at 23:30
  • I already tried solutions in that topic, and actualy it works, but not in all pages... for example try that exact page which is in example and yes, i need to save some content to DB - Open Graph data – PayteR Mar 08 '13 at 23:32
1

I think you're looking for something like this

$opts = array('http' => array('header' => 'Accept-Charset: UTF-8, *;q=0'));
$context = stream_context_create($opts);

$filename = "http://www.bratm.sk/trochu-harlem-inspiracie-zaslite-nam-svoj-super-harlem-    shake-ak-bude-husty-zverejnime-ho";
echo file_get_contents($filename, false, $context);
Cobolt
  • 935
  • 2
  • 11
  • 24
1

Use cURL. This function is an alternative to file_get_contents.

function url_get_contents($Url) {
if (!function_exists('curl_init')){
    die('CURL is not installed!');
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
$data = url_get_contents("http://www.bratm.sk/trochu-harlem-inspiracie-zaslite-nam-svoj-super-harlem-shake-ak-bude-husty-zverejnime-ho/");
print_r($data);
mkungla
  • 3,390
  • 1
  • 26
  • 37
  • with this i get empty string – PayteR Mar 08 '13 at 23:38
  • can you try it again please? `function url_get_contents ($Url) { if (!function_exists('curl_init')){ die('CURL is not installed!'); } $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $Url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $output = curl_exec($ch); curl_close($ch); return $output; }; $html = url_get_contents("http://www.bratm.sk/trochu-harlem-inspiracie-zaslite-nam-svoj-super-harlem-shake-ak-bude-husty-zverejnime-ho/"); echo $html;exit;` – PayteR Mar 08 '13 at 23:50
  • I edited my post and there is working code. What I noticed in your comment that you don't need ";" after function.. – mkungla Mar 08 '13 at 23:57
  • its totaly weird but now this page works... but another `"http://www.bratm.sk/viagra-jedine-vysvetlenie-ako-tam-ta-zena-drzi-d-d/"` still not, can you try this one too? – PayteR Mar 09 '13 at 00:03
  • if(curl_errno($ch)) { return 'error:' . curl_error($ch); } http://php.net/manual/en/function.curl-error.php – mkungla Mar 09 '13 at 00:10
  • no error there... can you try this please? `function url_get_contents($Url) { if (!function_exists('curl_init')) { die('CURL is not installed!'); } $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $Url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $output = curl_exec($ch); if (curl_errno($ch)) { return 'error:' . curl_error($ch); } curl_close($ch); return $output; } $data = url_get_contents("http://www.bratm.sk/viagra-jedine-vysvetlenie-ako-tam-ta-zena-drzi-d-d/"); echo($data);` – PayteR Mar 09 '13 at 10:37
  • im really dont understand this, but now this url works (im didnt change anything), but next page `"http://www.bratm.sk/ako-sa-robi-iglu/"` not... – PayteR Mar 09 '13 at 12:29
  • ok, i have it finally! problem was that pages are GZIPed, to solutuion is to add curl_setopt($ch,CURLOPT_ENCODING , 'gzip'); to curl options, thx for help! – PayteR Mar 09 '13 at 13:13