1

The following script does not always correctly catch and convert foreign characters. Could someone show me what I'm missing to get it to be more robust?

<?php
include("../index_head.inc.php");
$content = implode("",(@file("current.txt")));

$url = "http://XXXXXX.html?no_body=1";
$content = file_get_contents($url,'r');

if (isset($_GET['showcurrent']) && $_GET['showcurrent'] == '')
    {
    $content = substr($content,1,strpos($content,"<hr ")-1);
    }
    else 
    {
    $content = str_replace("<br style=\"clear:both\" />\n</p>", "</p>",$content);   
    $content = str_replace("ck1\"><img", "ck1\" target=_blank><img",$content);  
    };

$content = str_replace("<h3>current</h3>", "",$content);

echo "<div id=\"service\" style=\"width: 660px;padding-left:5px\">",str_replace("current.html","current.html",$content),"</div>";

include("../index_footer.inc.php");
?>

New information: Pekka, you gave me the idea to check how the page emits without str_replace():

<?php
include("../index_head.inc.php");
$content = implode("",(@file("current.txt")));
$url = "XXXXXX.html?no_body=1";
$content = file_get_contents($url,'r');
echo "<div id=\"service\" style=\"width: 660px;padding-left:5px\">",$content,"</div>";

It seems the problem lies elsewhere because I get the same mangling even without using str_replace()! If you can help me get this sorted out, I would sure appreciate it. I have seen your wish list. ;)

Carey G. Butler
  • 59
  • 1
  • 10
  • 1
    That's why PHP provides the [multibyte string library](http://www.php.net/manual/en/book.mbstring.php)... str_replace, substr, etc work on bytes, not characters – Mark Baker Oct 14 '13 at 00:21
  • @MarkBaker there's no mb_str_replace though. – Sébastien Oct 14 '13 at 00:34
  • Thanks Mark and Sébastien. I used your comments to find this: ( http://stackoverflow.com/questions/3786003/str-replace-on-multibyte-strings-dangerous) – Carey G. Butler Oct 14 '13 at 00:36
  • I tried the mb_replace function described in 3786003, but it didn't change a thing. The problem remains. – Carey G. Butler Oct 14 '13 at 00:49
  • I do not see any instance in your code where it tries to replace an umlaut? What exactly fails where? – Pekka Oct 14 '13 at 05:05

2 Answers2

3

Did you include the charset in php?

try this:

header('Content-Type: text/html; charset=utf-8');

If not working check if your file is already saved in utf8 before str replace:

utf8_encode ( string $data );

In the opposite case use:

utf8_decode( string $data );

Hope it helps!

SBO
  • 623
  • 2
  • 8
  • 22
0

Thank you SBO - It sure did help! I simply changed the code to:

<?php
include("../index_head.inc.php");
$content = implode("",(@file("current.txt")));

$url = "http://XXXXXX.html?no_body=1";
$content = file_get_contents(utf8_encode($url),'r');

if (isset($_GET['showcurrent']) && $_GET['showcurrent'] == '')
    {
    $content = substr($content,1,strpos($content,"<hr ")-1);
    }
else 
{
$content = str_replace("<br style=\"clear:both\" />\n</p>", "</p>",$content);   
$content = str_replace("ck1\"><img", "ck1\" target=_blank><img",$content);  
};

$content = str_replace("<h3>current</h3>", "",$content);

echo "<div id=\"service\" style=\"width: 660px;padding-left:5px\">",str_replace("current.html","current.html",utf8_decode($content)),"</div>";

include("../index_footer.inc.php");
?>

and everything is working fine. Thank you very much for your help.

Carey G. Butler
  • 59
  • 1
  • 10