0

Function converts all other characters, just 'ø' what is UTF-8 character not, all other chars, like "Ч,Č,Ć,Đ,Š,Ž,Ђ,Ж,Љ" etc. converts normally to ascii...

This is function what i use:

function toAscii($str, $replace=array(), $delimiter='-') {
    if( !empty($replace) ) {
        $str = str_replace((array)$replace, ' ', $str);
    }

    $clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
    $clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $clean);
    $clean = strtolower(trim($clean, '-'));
    $clean = preg_replace("/[\/()_|+ -]+/", $delimiter, $clean);
    return $clean;
}

I need it to ascii for url.

DocNet
  • 460
  • 3
  • 9
  • 24
  • You can base64 encode anything for URL – T.S. Sep 08 '15 at 19:01
  • How do you mean ? If i use base64_encode i will convert Tromsø IL to something like VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw== ? I want just tromso. ø=>o – DocNet Sep 08 '15 at 19:08
  • What is your encoding set to? When I run `toAscii("Ч,Č,Ć,Đ,Š,Ž,Ђ,Ж,Љ", array(), '-');` I receive "Notice: iconv(): Detected an illegal character in input string ", but when I run `toAscii("ø", array(), '-');` it converts to an "o" – mseifert Sep 08 '15 at 19:30
  • I select data from database where are UTF-8 characters. Intersting, that characters what are "illegal" it converts normally... – DocNet Sep 08 '15 at 19:45
  • You might try mb_detect_encoding($string) on that database data to verify what you have coming in. – mseifert Sep 08 '15 at 19:49
  • Yes i did that, ASCII and UTF8 are. Tromsø IL is UTF-8, how is possible when i toAscii('Tromsø IL'); it returns me tromso-il, but when i toAscii($row['local']) from db than is troms-il... How is that possible O.O When i echo $row['local'] it is Tromsø IL, with UTF-8 as mb_detect, when i try to strpos($str,'ø') it won't recognise character, have you idea what might be a problem? – DocNet Sep 08 '15 at 19:54
  • Just for kicks, try mb_strpos(($str,'ø'). You should loop through the db string and echo the ord() for each character. If you don't know how to do this, I can post an example. – mseifert Sep 08 '15 at 20:01
  • Nope, mb_strpos won't work.. Please give me an example? – DocNet Sep 08 '15 at 20:13
  • Yes, you will get base-64 encoded string that you will decode to get the data – T.S. Sep 08 '15 at 20:18
  • @T.S. Please can you show me an example? ord of whole string returns me 84114111109115195184327376 when i try using ascii converter to compare i see that end is different look: 084 114 111 109 115 195 184 032 073 076 327376 is from php, 073 076 is end in this converter http://www.unit-conversion.info/texttools/ascii/ – DocNet Sep 08 '15 at 20:19
  • Check this out http://stackoverflow.com/a/5835352/1704458 – T.S. Sep 08 '15 at 20:24
  • For passing in url, i would advise the same as T.S.. On the other end use base64_decode to convert it back – mseifert Sep 08 '15 at 20:24
  • If i use it in begin in toAscii function i get same: $str = base64_url_encode($str); $str = base64_url_decode($str); $clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str); $clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $clean); $clean = strtolower(trim($clean, '-')); $clean = preg_replace("/[\/()_|+ -]+/", $delimiter, $clean); return $clean; – DocNet Sep 08 '15 at 20:30
  • 1
    You don't need to use your function at all. Just url_encode before passing as a url and then url_decode. You then have your string untouched. This handles your stated need: "I need it to ascii for url." – mseifert Sep 08 '15 at 20:40
  • I think that is different ASCII value of 'ø' from database and 'ø' what ASCII accepts... When i replace toAscii function to just $str = base64_url_encode($str); $str = base64_url_decode($str); Than i get in url question diamonds... – DocNet Sep 08 '15 at 20:50
  • 1
    see: [utf8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – mseifert Sep 08 '15 at 21:00

1 Answers1

0

The iconv "transliterate to ascii" function is unreliable. There isn't always a universal transliteration or a transliteration that makes sense from any arbitrary unicode codepoint to ascii. There isn't a unicode standard saying how to do it (although at one point there was a draft one, it was abandoned as unsuccessful). So, anyway, there isn't a reliable way to do this, or the iconv function isn't one. Just how it goes.

As others have commented, there is a standard way to put unicode into a URL though. Trying to transliterate arbitrary unicode codepoints to ascii is unlikely to be the right solution to your problem; even for the transliteration that is happening, you are likely losing meaning.

jrochkind
  • 22,799
  • 12
  • 59
  • 74