I've searched around for a while and haven't yet found something that'll work for me. I am using a PHP form to submit data into SAP using the SAP DI API. I need to figure out which character set will actually allow me to store and work with Vietnamese characters.
UTF8 seems to work for a lot of the characters but ô
becomes ô
. More importantly, there are character limits, and UTF-8 breaks character limits. If I have a string of 30 characters it tells the API that it's more than 50. The same is true for storing in MySQL--if there's a varchar character limit, UTF-8 causes the string to go above it.
Unfortunately, when I search, UTF-8 seems to be the only thing people suggest for Vietnamese characters. If I don't encode the characters at all, they get stored as their html character codes. I've also tried ISO-8859-1, converting into UCS-2 or UCS-4... I'm really at a loss. If anyone has experience working with vietnamese characters, your help would be greatly appreciated.
UPDATE
It appears the issue may be with my wampserver on Windows. here's a bit of code that is confusing me:
$str = 'VậTCôNG';
$str1 = utf8_encode($str);
if (mb_detect_encoding($str,"UTF-8",true) == true) {
print_r('yes');
if ($str1 == $str) {
print_r('yes2');
}
}
echo $str . $str1;
This prints "yes" but not "yes2", and $str.str1 = "VậTCôNGVáºTCôNG" in the browser.
I have my php.ini file with:
default_charset = "utf-8"
and my httpd.conf file with:
AddDefaultCharset UTF-8
and my php file I'm running has:
header("Content-type: text/html; charset=utf-8");
So I'm now wondering: if the original string was utf-8, why wouldn't it equal a utf8 encoding of itself? and why is the utf8 encoding returning wrong characters? Is something wrong in the wampserver configurations?