RSS feed encoding

Question

I have an RSS feed that's being generated from user entered data. There are many users that are entering in text in Japanese, and most of the time there is no issues. However, there's one particular RSS feed that is displaying the errror:

error on line 25 at column 25: Input is not proper UTF-8, indicate encoding !
Bytes: 0x0B 0x32 0x38 0x20

Note in this particular RSS feed, this is not the first spot in which Japanese characters appear.

I've seen other answers Error: "Input is not proper UTF-8, indicate encoding !" using PHP's simplexml_load_string which suggest trying to change the encoding or some such, but I'm confused as to why the encoding is only failing on this particular feed, and also, if it's because this one person input Japanese encoded in a different manner, how I could I detect when someone inputs in a different way, and selectively fix only those that might cause an issue.

EDIT: Per this article: http://www.localizingjapan.com/blog/2012/01/30/detecting-and-conveting-japanese-multibyte-encodings-in-php/

I tried adding the following:

if (!mb_check_encoding($content, "UTF-8")) {

           $content = mb_convert_encoding($content, "UTF-8",
              "Shift-JIS, EUC-JP, JIS, SJIS, JIS-ms, eucJP-win, SJIS-win, ISO-2022-JP,
               ISO-2022-JP-MS, SJIS-mac, SJIS-Mobile#DOCOMO, SJIS-Mobile#KDDI,
               SJIS-Mobile#SOFTBANK, UTF-8-Mobile#DOCOMO, UTF-8-Mobile#KDDI-A,
               UTF-8-Mobile#KDDI-B, UTF-8-Mobile#SOFTBANK, ISO-2022-JP-MOBILE#KDDI");
        }

But, it's still reporting as not being properly encoded in utf8.

Edit2: So, I'm extremely confused, because I just had it log what encoding mb_detect_encoding thinks the text is, and it's all coming back as either ASCII (must be other fields, since Japanese cannot be ASCII, obviously), and UTF-8. Do you have any idea why it can think it's UTF-8, but still be getting these encoding errors?

You have missed to show the actual feed in your question. It might be that the XML is already broken in the feed you've got so no standard XML parser will eat it without previous fixings. — hakre, Jan 30 '15 at 21:19
Definitely not the XML that is broken. Currently I'm suspecting there's something really weird about the text the user entered. I tried copy/pasting into open office, then without making any edits, copy/pasted back into the app, generated the RSS feed again, and it worked. — Kai, Jan 30 '15 at 21:24
If a user is able to enter text and that entered text is that wrongly processed an invalid XML RSS feed is created, then actually the XML is broken. Sure the XML has not broken by itself (but by the data entered), but that the XML is broken is acutally the error message you see. So first of all you have to see here that there is an error for a reason. — hakre, Jan 30 '15 at 21:27
The text in question contains nothing but Japanese characters. It isn't causing malformed XML. — Kai, Jan 30 '15 at 21:31
You're barking the wrong tree. ***The XML is broken*** this is why you see the error message. Character encoding is part of the XML document, the specs are here: http://www.w3.org/TR/REC-xml/#charsets — hakre, Jan 30 '15 at 21:34

score -1 · Answer 1 · answered Jan 30 '15 at 16:41

-1

Make sure you´ve encoded the users input properly to UTF-8.

http://php.net/manual/de/function.utf8-encode.php

string utf8_encode ( string $data )

answered Jan 30 '15 at 16:41

montelyno

37
2

No go, just wrapping that around the user input makes the Japanese no longer appear correctly, and the rss feed is still reporting improper encoding. – Kai Jan 30 '15 at 16:59
Just throwing functions out for fun? From the first sentence of the question: *"There are many users that are entering in text in Japanese"* - `utf8_encode` has not been created for japanese text. – hakre Jan 30 '15 at 21:17
Yeah just for fun. Are you serious? I´m trying to help, but your answer is meaningless. Search for solutions or stop answering wiith this content. – montelyno Jan 31 '15 at 20:15
What I wanted to say with the comment is: `utf8_encode` is ***never*** a useful function in the context of the question. Your answer suggests otherwise, therefore it is not helpful. Apart from the function suggestion your answer technically doesn't seem to be very wrong, however obviously the person asking isn't aware how to actually achieve that. – hakre Feb 02 '15 at 09:50

RSS feed encoding

1 Answers1