str_replace PHP script can't handle foreign characters such as umlauts robustly

Question

The following script does not always correctly catch and convert foreign characters. Could someone show me what I'm missing to get it to be more robust?

<?php
include("../index_head.inc.php");
$content = implode("",(@file("current.txt")));

$url = "http://XXXXXX.html?no_body=1";
$content = file_get_contents($url,'r');

if (isset($_GET['showcurrent']) && $_GET['showcurrent'] == '')
    {
    $content = substr($content,1,strpos($content,"<hr ")-1);
    }
    else 
    {
    $content = str_replace("<br style=\"clear:both\" />\n</p>", "</p>",$content);   
    $content = str_replace("ck1\"><img", "ck1\" target=_blank><img",$content);  
    };

$content = str_replace("<h3>current</h3>", "",$content);

echo "<div id=\"service\" style=\"width: 660px;padding-left:5px\">",str_replace("current.html","current.html",$content),"</div>";

include("../index_footer.inc.php");
?>

New information: Pekka, you gave me the idea to check how the page emits without str_replace():

<?php
include("../index_head.inc.php");
$content = implode("",(@file("current.txt")));
$url = "XXXXXX.html?no_body=1";
$content = file_get_contents($url,'r');
echo "<div id=\"service\" style=\"width: 660px;padding-left:5px\">",$content,"</div>";

It seems the problem lies elsewhere because I get the same mangling even without using str_replace()! If you can help me get this sorted out, I would sure appreciate it. I have seen your wish list. ;)

That's why PHP provides the [multibyte string library](http://www.php.net/manual/en/book.mbstring.php)... str_replace, substr, etc work on bytes, not characters — Mark Baker, Oct 14 '13 at 00:21
Thanks Mark and Sébastien. I used your comments to find this: ( http://stackoverflow.com/questions/3786003/str-replace-on-multibyte-strings-dangerous) — Carey G. Butler, Oct 14 '13 at 00:36
I tried the mb_replace function described in 3786003, but it didn't change a thing. The problem remains. — Carey G. Butler, Oct 14 '13 at 00:49
I do not see any instance in your code where it tries to replace an umlaut? What exactly fails where? — Pekka, Oct 14 '13 at 05:05

score 3 · Accepted Answer · answered Oct 14 '13 at 07:03

3

Did you include the charset in php?

try this:

header('Content-Type: text/html; charset=utf-8');

If not working check if your file is already saved in utf8 before str replace:

utf8_encode ( string $data );

In the opposite case use:

utf8_decode( string $data );

Hope it helps!

answered Oct 14 '13 at 07:03

SBO

623
2
8
22

It did! Thank you SBO. – Carey G. Butler Oct 14 '13 at 17:55

score 0 · Answer 2 · answered Oct 14 '13 at 17:52

Thank you SBO - It sure did help! I simply changed the code to:

<?php
include("../index_head.inc.php");
$content = implode("",(@file("current.txt")));

$url = "http://XXXXXX.html?no_body=1";
$content = file_get_contents(utf8_encode($url),'r');

if (isset($_GET['showcurrent']) && $_GET['showcurrent'] == '')
    {
    $content = substr($content,1,strpos($content,"<hr ")-1);
    }
else 
{
$content = str_replace("<br style=\"clear:both\" />\n</p>", "</p>",$content);   
$content = str_replace("ck1\"><img", "ck1\" target=_blank><img",$content);  
};

$content = str_replace("<h3>current</h3>", "",$content);

echo "<div id=\"service\" style=\"width: 660px;padding-left:5px\">",str_replace("current.html","current.html",utf8_decode($content)),"</div>";

include("../index_footer.inc.php");
?>

and everything is working fine. Thank you very much for your help.

str_replace PHP script can't handle foreign characters such as umlauts robustly

2 Answers2