I am working with exporting accented characters from a mySQL database to XML, but I am getting really wonky results.
For the basics - the mySQL table is set up as latin-1 encoding. Not ideal. However, all input is run through HTML entities, which seems to be working great; I can read data back all day long, and it looks correct on the screen.
Here is a sample item.
On the screen, it looks like this:
me hace reír
Note the accented "i" character (with acute accent).
In the database, it is stored like this:
me hace reír
The "i" with the acute is properly replaced with the HTML entity, which allows for proper display on screen. If I wrap that inside of a textarea, it still reads correctly - no acute HTML entity, just he correct accented "i" character.
My XML file has a proper UTF-8 header on it:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?
But when I read the data from the DB and export it to the XML...
$xml.="<dedicatedBecause>".($dedicatedbecause)."</dedicatedBecause>"."\n";
With "$dedicatedbecause" holding a totally unprocessed piece of data from the DB, I get the following in my XML file:
me hace reÃ-r
In other words, a DIFFERENT accent character plus a dash. In other cases, I get other nonsense characters (copyright symbol, various other accents, etc, etc).
I have a huge function for massaging data to UTF-8, but it doesn't seem to matter. If I turn it off, I get the same result.
What gives? What am I missing here?
Thanks for your help!