1

I am trying to scrape a webpage in arabic and everything works fine except the fact that when i echo the text what i get is a garbled up text even though i have set the header to UTF-8

Here is my code

<?php

    header ('Content-Type: text/html; charset=UTF-8'); 

    require 'vendor/autoload.php';

    use Goutte\Client;


    $client = new Client();

    $crawler = $client->request('GET', 'http://www.lebanonfiles.com');

    $news_container = $crawler->filter('#mcs4_container .line');

    $news_container->each(function($node) {

        echo $node->text();

    })
?>

What i get is this piece of garbled text enter image description here

Bazinga777
  • 5,140
  • 13
  • 53
  • 92
  • Setting the meta header isn't always sufficient. → What headers does Apache send along? → What does your browser/page inspection show in reality? → What happens if you store the output to a file, and open with an UTF-8 editor? → Does Goutte correctly [extract it as UTF-8](http://stackoverflow.com/questions/18782332/can-goutte-guzzle-be-forced-into-utf-8-mode) anyway? → Also a screenshot dump isn't very useful. Make it an hexdump at least. → Present a little bit own research by comparing the UTF-8 sequences for expectations, with Unicode tables, etc. – mario Aug 09 '15 at 12:36
  • Try to set the charset in HTML and `PHP` as well. It may help. – Jo Smo Aug 09 '15 at 12:51
  • This might help you , [Force Goutte/Guzzle to be into UTF-8 mode](http://stackoverflow.com/questions/18782332/can-goutte-guzzle-be-forced-into-utf-8-mode) – Mohammed Abrar Ahmed Jun 26 '16 at 12:00

2 Answers2

1

You should try this... try to put this line at beginning of your php file: ini_set('default_charset', 'UTF-8'); this may solve your issue.

Have a nice day.

Alessandro
  • 900
  • 12
  • 23
1
  • ALL attributes must be set to UTF-8, on all levels of your application/script.
  • Save the document as UTF-8 or as UTF-8 w/o BOM (If you're using Notepad++, it's Format -> Convert to UTF-8)
    • Note that even though they are both UTF-8, they can behave differently!
  • The header in both PHP and HTML should be set to UTF-8
    • HTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    • PHP: header('Content-Type: text/html; charset=utf-8');
  • You may need to specify your charset in your php.ini file, using default_charset = "utf-8", although this is standard in PHP 5.6
  • Everything that can be set to a specific charset, should be set to the same.

There might be different aspects of your code that needs to be set to a specific charset.

Qirel
  • 25,449
  • 7
  • 45
  • 62