I have an RSS feed that's being generated from user entered data. There are many users that are entering in text in Japanese, and most of the time there is no issues. However, there's one particular RSS feed that is displaying the errror:
error on line 25 at column 25: Input is not proper UTF-8, indicate encoding !
Bytes: 0x0B 0x32 0x38 0x20
Note in this particular RSS feed, this is not the first spot in which Japanese characters appear.
I've seen other answers Error: "Input is not proper UTF-8, indicate encoding !" using PHP's simplexml_load_string which suggest trying to change the encoding or some such, but I'm confused as to why the encoding is only failing on this particular feed, and also, if it's because this one person input Japanese encoded in a different manner, how I could I detect when someone inputs in a different way, and selectively fix only those that might cause an issue.
EDIT: Per this article: http://www.localizingjapan.com/blog/2012/01/30/detecting-and-conveting-japanese-multibyte-encodings-in-php/
I tried adding the following:
if (!mb_check_encoding($content, "UTF-8")) {
$content = mb_convert_encoding($content, "UTF-8",
"Shift-JIS, EUC-JP, JIS, SJIS, JIS-ms, eucJP-win, SJIS-win, ISO-2022-JP,
ISO-2022-JP-MS, SJIS-mac, SJIS-Mobile#DOCOMO, SJIS-Mobile#KDDI,
SJIS-Mobile#SOFTBANK, UTF-8-Mobile#DOCOMO, UTF-8-Mobile#KDDI-A,
UTF-8-Mobile#KDDI-B, UTF-8-Mobile#SOFTBANK, ISO-2022-JP-MOBILE#KDDI");
}
But, it's still reporting as not being properly encoded in utf8.
Edit2: So, I'm extremely confused, because I just had it log what encoding mb_detect_encoding thinks the text is, and it's all coming back as either ASCII (must be other fields, since Japanese cannot be ASCII, obviously), and UTF-8. Do you have any idea why it can think it's UTF-8, but still be getting these encoding errors?