0

I'm writing some exports to move content from our custom Symfony CMS into a new site built on Craft CMS.

To break down an article into separate fields, I'm using DOMDocument to cycle through the elements. This all seems to be working - dumping out the content and then exiting out shows the content I'd expect. Then I prepare a response and pass it to the view template and dump it out and I'm getting malformed characters.

I'm assuming this is something to do with character encoding, but I'm not really sure how.

The code in the controller:

$response = new Response();
$response->headers->set('Content-Type', 'xml');

return $this->render('FrontBundle:Export:articles.html.twig', array(
    'pages'     => $pages
), $response);

As I say, $pages has been prepared with DOMDocument and shows as expected when dumped out from the controller. e.g:

it’s tough, it’s arduous and it’s laborious

When this is passed to the view, I then get this output:

itâs tough, itâs arduous and itâs laborious

That's coming from this:

<?xml version="1.0" encoding="UTF-8"?>
<pages>
    {% for page in pages %}
        <page>
            <title>{{ page.title }}</title>
            <intro>{{ page.article.intro|join('') }}</intro>
            <article>
                {% for block in page.article.body %}
                    <block>
                        <heading>{{ block.heading }}</heading>
                    </block>
                    <block>
                        <content>{{ block.content }}</content>
                    </block>
                {% endfor %}
            </article>
        </page>
    {% endfor %}
</pages>

Though I also don't think it's an XML issue as the same thing happens if I don't set the content-type and just strip the view down to {{ dump(pages) }}

So I'm completely flummoxed - any suggestion would be greatly appreciated!

Jammooka
  • 3
  • 4
  • 1
    I guess your database/table doesn't use a utf8 charset. See [this answer](https://stackoverflow.com/a/3854705/4433067). – Jenne Jun 08 '17 at 16:12
  • Possible duplicate of [UTF-8 all the way through](https://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – miken32 Jun 08 '17 at 18:17

1 Answers1

0

Off the back of one of the comments, I discovered that UTF-8 needs to be specified over and over.

After that, it was just a case of echoing out the content after every operation and it seems DOMDocument was to blame.

For anyone having the same issue, the answer here is what fixed it for me: PHP DOMDocument loadHTML not encoding UTF-8 correctly

Specifically, $dom->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'));

Also note that it seems you can't rely on a var_dump to see if the content is ok, as a var_dump even after the DOMDocument screw up was showing the text correctly, but looping and echoing it out revealed the issue.

Jammooka
  • 3
  • 4