0

In a Symfony project, I have something like this:

$crawler = new Crawler($this->bigString());
$array = array();
foreach($crawler->filter('.editable') as $domElement )
{
    $innerHtml = $this->getChildHtml($domElement);
    $array[$domElement->getAttribute('id')] = $innerHtml;
    $domElement->nodeValue = '{{ listEditables["' . $domElement->getAttribute('id') . '"] }}';
    $crawler->addNode($domElement);
 }
$page->setEditables($array);
$em->persist($page);
$em->flush();

Where the bigString method returns

<div class="info pmt0 wrap">
  <p id="p_editable_2" class="editable">所谓设计,创于人,且用于人......</p>
  ... a bunch of other html element with Chinese in it ...
</div>

and I use

private function getChildHtml($node) 
{
    $innerHtml= '';
    $children = $node->childNodes;

    foreach( $children as $child )
    {
        $innerHtml .= sprintf( '%s%s', $innerHtml, $child->ownerDocument->saveXML( $child ) );
    }

    return $innerHtml;
}

to get the inner HTML.

I can't get the persisted elements to be accurately representing the Chinese characters, all I get is nonsense. Using

 var_dump($innerHtml);

shows that getChildHtml() doesn't return the original Chinese characters. Any idea how I can get it to do so?

Afunakwa
  • 437
  • 1
  • 7
  • 20

2 Answers2

0

According to this question, sprintf won't help you. Instead, use something like mb_substr().

$innerHtml .= mb_substr($child->ownerDocument->saveXML( $child ), 0);

Other than that, most probably the right encoding will help you out.

rndus2r
  • 496
  • 4
  • 17
0

Turns out the problem was with Symfony Crawler.

As advised here, a good workaround is to write

$crawler = new Crawler();
$crawler->addHtmlContent($this->bigString());

instead of

$crawler = new Crawler($this->bigString());

since the addHtmlContent method encode in UTF8 by default

Afunakwa
  • 437
  • 1
  • 7
  • 20