0

I'm trying to save UTF-8 characters with Zend_Cache (like Ť, š etc) but Zend_Cache is messing them up and saves them as Å, ¾ and other weird characters.

Here is a snippet of my code that saves the data to the cache (the UTF-8 characters are messed up only online, when I try it on my PC on localhost it works ok):

// cache the external data
$data = array('nextRound' => $nextRound,
              'nextMatches' => $nextMatches,
              'leagueTable' => $leagueTable);
$cache = Zend_Registry::get('cache');
$cache->save($data, 'externalData');

Before I save the cached data, I purify it with HTMLPurifier and do some parsing with DOM, something like this:

    // fetch the HTML from external server
    $html = file_get_contents('http://www.example.com/test.html');

    // purify the HTML so we can load it with DOM
    include BASE_PATH . '/library/My/htmlpurifier-4.0.0-standalone/HTMLPurifier.standalone.php';
    $config = HTMLPurifier_Config::createDefault();
    $config->set('HTML.Doctype', 'XHTML 1.0 Strict');
    $purifier = new HTMLPurifier($config);
    $html = $purifier->purify($html);

    $dom = new DOMDocument();
    // hack to preserver UTF-8 characters
    $dom->loadHTML('<?xml encoding="UTF-8">' . $html);
    $dom->preserveWhiteSpace = false;

    // some parsing here

Here is how I initialize Zend_Cache in the bootstrap file:

protected function _initCache()
{
    $frontend= array('lifetime' => 7200,
                     'automatic_serialization' => true);
    $backend= array('cache_dir' => 'cache');
    $this->cache = Zend_Cache::factory('core',
                                       'File',
                                       $frontend,
                                       $backend);
}

Any ideas? It works on localhost (where I have support for the foreign language used in the HTML) but not on the server.

Charles
  • 50,943
  • 13
  • 104
  • 142
Richard Knop
  • 81,041
  • 149
  • 392
  • 552
  • Are you absolutely sure its not the Purifier processing thats meesing it up? What happens if you eliminate that part of the process? – prodigitalson Feb 10 '10 at 00:38
  • I'm 99% sure. I tried eliminating the HTMLPurifier part of the process and the problem persists. – Richard Knop Feb 10 '10 at 00:45
  • 1
    Post your mbstring configuration in php.ini on localhost and server please – mike Feb 12 '10 at 20:12
  • Hi Michal. I have actually found out it was DOM that was messing with UTF-8 characters, see this question: http://stackoverflow.com/questions/2236889/why-does-dom-change-encoding – Richard Knop Feb 12 '10 at 21:25

1 Answers1

0

I had a similar problem with a FPDF deployment. Here, the html space character &nbsp was being converted into that same Å character that you're getting here. It was fine on my local windows, but did not work in my linux server environment.

Try this:

$str = iconv('UTF-8', 'windows-1252', html_entity_decode($str));

Joel Small
  • 187
  • 1
  • 5