53

Some of my script are using different encoding, and when I try to combine them, this has becom an issue.

But I can't change the encoding they use, instead I want to change the encodig of the result from script A, and use it as parameter in script B.

So: is there any simple way to change a string from UTF-8 to ISO-88591 in PHP? I have looked at utf_encode and _decode, but they doesn't do what i want. Why doesn't there exsist any "utf2iso()"-function, or similar?

I don't think I have characters that can't be written in ISO-format, so that shouldn't be an huge issue.

mat
  • 12,943
  • 5
  • 39
  • 44
qualbeen
  • 1,534
  • 4
  • 16
  • 27
  • 1
    utf8_decode should exactly be your utf2iso?!? – BlaM Dec 17 '08 at 13:05
  • It's worth noting that PHP continues to move to utf-8 internally so any strings you have probably are coming from outside. Set cURL, file access functions, streams, PDO/MySQL, or any other API for accessing outside data to use UTF-8 so that it will already be correct when PHP gets it. – Xeoncross Oct 22 '15 at 19:36

10 Answers10

137

Have a look at iconv() or mb_convert_encoding(). Just by the way: why don't utf8_encode() and utf8_decode() work for you?

utf8_decode — Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1

utf8_encode — Encodes an ISO-8859-1 string to UTF-8

So essentially

$utf8 = 'ÄÖÜ'; // file must be UTF-8 encoded
$iso88591_1 = utf8_decode($utf8);
$iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $utf8);
$iso88591_2 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');

$iso88591 = 'ÄÖÜ'; // file must be ISO-8859-1 encoded
$utf8_1 = utf8_encode($iso88591);
$utf8_2 = iconv('ISO-8859-1', 'UTF-8', $iso88591);
$utf8_2 = mb_convert_encoding($iso88591, 'UTF-8', 'ISO-8859-1');

all should do the same - with utf8_en/decode() requiring no special extension, mb_convert_encoding() requiring ext/mbstring and iconv() requiring ext/iconv.

Stefan Gehrig
  • 82,642
  • 24
  • 155
  • 189
  • Thanks for a good answer, and you and the others here are right: utf8_decode() seems to get the work done. There must have been some problems with files or my browser. At least I'm no longer able to reproduce the errors... (Maybe I did something wrong with my browser-charset-settings?) – qualbeen Dec 17 '08 at 19:01
  • Just for the record: I'd faced some situation like that, but I've noticed the iconv has been called twice (nested) to the same str var. After I removed that first call, works like a charm. (utf8_decode and mb_convert_enconding haven't be used) – thicolares Jun 08 '12 at 21:31
  • This advice helped me to solve a peculiar problem where a UTF-8 string ("Atlántico") was first literally encoded into ISO-8859-1 (looked like "Atlántico") and then these single-byte characters were reencoded back to UTF-8 (looked exactly the same "Atlántico" but each character was UTF-8 encoded this time). utf8_decode() helped because it decoded the UTF-8 characters into their literal ANSI substitutes which were then somehow mysteriously properly read&displayed as UTF-8 characters. Does it makes sense or not? Hmm.. – Tyler Oct 25 '12 at 01:19
  • how to convert "सूगी, पोखरी तहसील में भारत à¤" characters to unicode human readable please suggest. – Harinarayan Apr 11 '22 at 11:05
  • @Harinarayan If you don't have a clue what encoding might have been used for the string, you're out of luck. There's no way to determine the encoding by just looking at the string. You can only guess. – Stefan Gehrig Apr 13 '22 at 07:15
  • 1
    utf8_encode/utf8_decode are deprecated: https://www.php.net/manual/en/function.utf8-encode.php – Juris Malinens Jan 11 '23 at 06:25
6

First of all, don't use different encodings. It leads to a mess, and UTF-8 is definitely the one you should be using everywhere.

Chances are your input is not ISO-8859-1, but something else (ISO-8859-15, Windows-1252). To convert from those, use iconv or mb_convert_encoding.

Nevertheless, utf8_encode and utf8_decode should work for ISO-8859-1. It would be nice if you could post a link to a file or a uuencoded or base64 example string for which the conversion fails or yields unexpected results.

phihag
  • 278,196
  • 72
  • 453
  • 469
  • iconv, or mb_convert_encoding? iconv requires knowing the input encoding, which might not be the case. – Benubird Apr 28 '16 at 12:34
  • @Benubird If you're guessing encoding, you're likely to get into even worse problems (now it's not easily reproducible, since it may depend on the frequency of characters). But you're right, `mb_convert_encoding` definitely belongs into this answer. Added. – phihag Apr 28 '16 at 13:01
  • 3
    "Avoid any encoding other than UTF8" is good advice in general but sometimes it's not possible. For example we're trying to get a 3rd party integration working where the party demands XML in Latin 1 format. – GordonM Mar 10 '17 at 15:30
3

It is much better to use

$value = mb_convert_encode($value,'HTML-ENTITIES','UTF-8');

Specially when you are using AJAX call for submitting 'ISO-8859-1' characters. It works for Chinese, Japanese, Czech, German and many more languages.

VINAY KANT
  • 31
  • 3
  • 1
    For anyone else that uses this solution, be aware the function is actually mb_convert_encoding – b4tch Oct 21 '20 at 15:06
3

Use html_entity_decode() and htmlentities().

$html = html_entity_decode(htmlentities($html, ENT_QUOTES, 'UTF-8'), ENT_QUOTES , 'ISO-8859-1');

htmlentities() formats your input into UTF8 and html_entity_decode() formats it back to ISO-8859-1.

Axel
  • 3,331
  • 11
  • 35
  • 58
1

set meta tag in head as

 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> 

use the link http://www.i18nqa.com/debug/utf8-debug.html to replace the symbols character you want.

then use str_replace like

    $find = array('“', '’', '…', '—', '–', '‘', 'é', 'Â', '•', 'Ëœ', 'â€'); // en dash
                        $replace = array('“', '’', '…', '—', '–', '‘', 'é', '', '•', '˜', '”');
$content = str_replace($find, $replace, $content);

Its the method i use and help alot. Thanks!

Robert
  • 5,278
  • 43
  • 65
  • 115
user2842936
  • 67
  • 1
  • 3
0

I used:

function utf8_to_html ($data) {
    return preg_replace(
        array (
            '/ä/',
            '/ö/',
            '/ü/',
            '/é/',
            '/à/',
            '/è/'
        ),
        array (
            '&auml;',
            '&ouml;',
            '&uuml;',
            '&eacute;',
            '&agrave;',
            '&egrave;'
        ),
        $data 
    );
}
Luís Cruz
  • 14,780
  • 16
  • 68
  • 100
0

I use this function:

function formatcell($data, $num, $fill=" ") {
    $data = trim($data);
    $data=str_replace(chr(13),' ',$data);
    $data=str_replace(chr(10),' ',$data);
    // translate UTF8 to English characters
    $data = iconv('UTF-8', 'ASCII//TRANSLIT', $data);
    $data = preg_replace("/[\'\"\^\~\`]/i", '', $data);


    // fill it up with spaces
    for ($i = strlen($data); $i < $num; $i++) {
        $data .= $fill;
    }
    // limit string to num characters
   $data = substr($data, 0, $num);

    return $data;
}


echo formatcell("YES UTF8 String Zürich", 25, 'x'); //YES UTF8 String Zürichxxx
echo formatcell("NON UTF8 String Zurich", 25, 'x'); //NON UTF8 String Zurichxxx

Check out my function in my blog http://www.unexpectedit.com/php/php-handling-non-english-characters-utf8

Luís Cruz
  • 14,780
  • 16
  • 68
  • 100
Ignacio Pascual
  • 431
  • 1
  • 6
  • 6
0

You need to use the iconv package, specifically its iconv function.

Martin v. Löwis
  • 124,830
  • 17
  • 198
  • 235
0

In my case after files with names containing those characters were uploaded, they were not even visible with Filezilla! In Cpanel filemanager they were shown with ? (under black background). And this combination made it shown correctly on the browser (HTML document is Western-encoded):

$dspFileName = utf8_decode(htmlspecialchars(iconv(mb_internal_encoding(), 'utf-8', basename($thisFile['path']))) );
user109764
  • 576
  • 6
  • 11
-2
function parseUtf8ToIso88591(&$string){
     if(!is_null($string)){
            $iso88591_1 = utf8_decode($string);
            $iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $string);
            $string = mb_convert_encoding($string, 'ISO-8859-1', 'UTF-8');       
     }
}
user1786647
  • 594
  • 4
  • 6