32

I still don't understand how iconv works.

For instance,

$string = "Löic & René";
$output = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string); 

I get,

Notice: iconv() [function.iconv]: Detected an illegal character in input string in...

$string = "Löic"; or $string = "René";

I get,

Notice: iconv() [function.iconv]: Detected an incomplete multibyte character in input string in.

I get nothing with $string = "&";

There are two sets of different outputs I need store them in the two different columns inside the table of my database,

  1. I need to convert Löic & René to Loic & Rene for clean url purposes.

  2. I need to keep them as they are - Löic & René as Löic & René then only convert them with htmlentities($string, ENT_QUOTES); when displaying them on my html page.

I tried with some of the suggestions in php.net below, but still don't work,

I had a situation where I needed some characters transliterated, but the others ignored (for weird diacritics like ayn or hamza). Adding //TRANSLIT//IGNORE seemed to do the trick for me. It transliterates everything that is able to be transliterated, but then throws out stuff that can't be.

So:

$string = "ʿABBĀSĀBĀD";

echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $string);
// output: [nothing, and you get a notice]

echo iconv('UTF-8', 'ISO-8859-1//IGNORE', $string);
// output: ABBSBD

echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $string);
// output: ABBASABAD
// Yay! That's what I wanted!

and another,

Andries Seutens 07-Nov-2009 07:38
When doing transliteration, you have to make sure that your LC_COLLATE is properly set, otherwise the default POSIX will be used.

To transform "rené" into "rene" we could use the following code snippet:
setlocale(LC_CTYPE, 'nl_BE.utf8');

$string = 'rené';
$string = iconv('UTF-8', 'ASCII//TRANSLIT', $string);

echo $string; // outputs rene

How can I actually work them out?

Thanks.

EDIT:

This is the source file I test the code,

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" class="no-js">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<?php
$string = "Löic & René";
$output = iconv("UTF-8", "ISO-8859-1//TRANSLIT", $string); 
?>
</html>
Rais Alam
  • 6,970
  • 12
  • 53
  • 84
Run
  • 54,938
  • 169
  • 450
  • 748
  • BTW: you do realize `ö` & `é` are in ISO-8859-1? Aside from the improper input charset, you might want to alter your output charset to `ASCII//TRANSLIT`. – Wrikken Jan 25 '11 at 14:51
  • 7
    I am so confused with these charsets... – Run Jan 25 '11 at 15:09
  • 2
    Thanks! I had to decode some Korean characters to UTF-8 and it was a real headache - eventually, the only thing that helped was doing: `$converted = iconv('EUC-KR', 'UTF-8//TRANSLIT', $data);` – ShayLivyatan Jul 04 '16 at 09:23

2 Answers2

28
$clean = iconv('UTF-8', 'ASCII//TRANSLIT', utf8_encode($s));
animuson
  • 53,861
  • 28
  • 137
  • 147
Riccardo
  • 281
  • 3
  • 2
14

And did you save your source file in UTF-8 encoding? If not (and I guess you didn't since that will produce the "incomplete multibyte character" error), then try that first.

wimvds
  • 12,790
  • 2
  • 41
  • 42
  • Most probably the answer (or if the strings don't originate in the / a file, improper character set for whatever source (db,http, etc.) of data. One thing is for certain: the input isn't utf-8. – Wrikken Jan 25 '11 at 14:49
  • @wimvds: thanks. how do I save my source file in UTF-8 encoding? Please see my edit above - I have utf-8 in my meta - is it correct? – Run Jan 25 '11 at 14:50
  • @lauthiamkok: If you're still testing the examples above then use a good editor/IDE that allows you to select the file encoding (ie. Notepad++ on Windows, Eclipse/NetBeans on any major OS). For input from webpages you should either use meta tags or the relevant Header calls (or preferably both) and when using MySQL, make sure you open that one in UTF-8 mode as well (`SET NAMES 'utf-8'`). – wimvds Jan 25 '11 at 14:59
  • @wimvds : I use Notepad++ and I just selected the file encoding - Encoding -> Encode in utf-8. Then it shows something strange in the code - $string = "LE6ic"; and I still have the same error message... – Run Jan 25 '11 at 15:06
  • 2
    @lauthiamkok: That's normal, you should use the `Convert` options in the Encoding menu, not the `Encode` options (oh, and you should select UTF-8 without BOM) if you want to change the encoding on a file that already contains encoded characters... You can also set NP++ to default to UTF-8 encoding for new files in Settings, Preferences, New Document/Default Directory, so that you don't forget to switch the encoding to UTF-8 when editing new files. – wimvds Jan 25 '11 at 15:46
  • @ wimvds: thanks for the help. I seem to get it right now. no more error messages but showing this L�ic & Ren�. so I use htmlentities($output, ENT_QUOTES); to convert them into html entities. wow I never knew about setting the encoding on the saved file can make so much difference! thank you very much. – Run Jan 25 '11 at 16:06