2

I am reading an rss feed http://beersandbeans.com/feed/

The feeds says it is UTF8 format, and I am using simplepie rss to import the content When i grab the content and store it in $content I perform the following:

<?php
header ('Content-type: text/html; charset=utf-8');
?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head> 
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head><body>
<?php
echo $content;
echo $enc = mb_detect_encoding($content, "UTF-8,ISO-8859-1", true);
echo $content = mb_convert_encoding($content, "UTF-8", $enc);
echo $enc = mb_detect_encoding($content, "UTF-8,ISO-8859-1", true);
?>
</body></html>

This then produces:

..... Camping:     2,000isk/day for 5 days) = $89 .....
ISO-8859-1
..... Camping: Â  Â           2,000isk/day for 5 days) = $89 .....
UTF-8

Why is it outputting the  ?

Mark Stosberg
  • 12,961
  • 6
  • 44
  • 49
Lizard
  • 43,732
  • 39
  • 106
  • 167

2 Answers2

2

Try not specifying "UTF-8,ISO-8859-1" and see what encoding it gives you. It might be detecting ISO-8859-1 because it's the last one in that list, rather than the actual encoding of the string.

Griff
  • 100
  • 8
0

Set strict-mode to true in mb_detect_encoding(), see http://www.php.net/manual/de/function.mb-detect-encoding.php#102510

Also try http://www.php.net/manual/de/function.mb-convert-encoding.php instead of iconv()

Tobias
  • 7,238
  • 10
  • 46
  • 77
  • Tried both of these and no luck :( – Lizard Apr 12 '11 at 12:07
  • hm, works for me... please edit your code above with your new try – Tobias Apr 12 '11 at 12:08
  • and what about this: echo $content = mb_convert_encoding($content, "UTF-8"); (without the optional third param) – Tobias Apr 12 '11 at 12:19
  • 'strict' specifies whether to use the strict encoding detection or not. If not in strict mode, some characters are labeled wrong, because they appear in maybe more than one char set. – Tobias Apr 23 '14 at 10:35