I have a database table called "tweets". I have downloaded tweets using the Twitter Search API. The database table collation is set to latin1_swedish_ci, as advised by MySQL, since it holds up for at least the Enlish language (I read this on the MySQL support pages somewhere).
Anyway, I now see a lot of tweets looking like this:
$S&P news: Bank of America’s Mortgage-Bond Accord http://bit.ly/oTXC5a
@LucciAlerts >> $BAC from a pincher play setup
미êµì‹ 용등급ì´ë–¨ì–´ì¡Œë„¤ RT @CNBC RT @alexcrippen: S&P affirms AA+
- I believe & and the likes can be fixed by using (PHP) htmlspecialchars_decode() to translate them to original characters;
- But I don't know how to fix "America’s", for instance. Obviously, ’ should be an apostophe ('), but how do I get it back?
- Finally, there are some people who like to put all sorts of ASCII characters in their tweets (stars, "real" smileys instead of emoticons). Those have been stored as "미êµì‹ 용등급ì´ë–¨ì–´ì¡Œë„¤", like above. Is there a way to fix this and if so, how?
Any help is greatly appreciated!