I have a string
in Korean(multi byte string
), with UTF-8
encoding, when using mb_substr()
it fails to detect it as multi byte
and hence mb_substr()
works like substr()
and I end up getting gibberish characters like "�" at the end of the string
.
星期三大象键盘开裂青蛙混杂纪念碑问题面包车斑马线 수요일 코끼리 키보드 개구리 뒤범벅 비석 이 질문에 반 얼룩말을 크래킹
Also using mb_detect_encoding()
I get UTF-8
, any ideas where am I going wrong?
The current function that I am using is :
function cleanseData($data, $mode = false, $limit = 0) {
if ($mode) {
$data = (mb_strlen ( $data ) > ($limit + 3)) ? mb_substr ( $data, 0, $limit, mb_detect_encoding($data) ) . '...' : $data;
}
$data = utf8tohtml ( $data, true );
return $data;
}