2

I am using Symfony2 Crawler - Bundle for using XPath. Everything works fine, except the encoding.

I would like to use UTF-8 encoding and the Crawler is somehow not using it. I noticed that because th   are converted to  , which is a known issue: UTF-8 Encoding Issue

My question is: How could I force the Symfony Crawler to use UTF-8 Encoding?

Here is the code I am using:

$dom_input = new \DOMDocument("1.0","UTF-8");
$dom_input->encoding = "UTF-8";
$dom_input->formatOutput = true;

$dom_input->loadHTMLFile($myFile);

$crawler = new Crawler($dom_input); 
$paragraphs = $crawler->filterXPath('descendant-or-self::p');

And now, when I am doing

foreach($paragraphs as $paragraph) {
    var_dump($paragraph->nodeValue);
}

As soon as I have a   in my paragraph, I am getting  .

Thank you very much in advance.

Community
  • 1
  • 1
Milos Cuculovic
  • 19,631
  • 51
  • 159
  • 265

1 Answers1

5

Thanks to @halfer, I found a workaround:

Instead of using

$crawler = new Crawler($dom_input);

I used:

$crawler = new Crawler();
$crawler->addHtmlContent(utf8_decode($dom_input->saveXML()));
Milos Cuculovic
  • 19,631
  • 51
  • 159
  • 265
  • I guess that means that `$dom_input->saveXML()` isn't emitting UTF-8 then, which is a bit odd, considering how you've set it up! Still, if you are confident the input format won't change, your workaround should be okay. – halfer Oct 10 '13 at 17:08