Problem with charset

Question

I have an MYSQL Database in utf-8 format, but the Characters inside the Database are ISO-8859-1 (ISO-8859-1 Strings are stored in utf-8). I've tried with recode, but it only converted e.g. Ã¼ to ÃÂ¼). Does anybody out there has an solution??

The easiest way would be to re-import the data with the correct character set specified. Any way to do that? — Pekka, Jun 14 '11 at 11:04
Here is a duplicate with good answers: [I need help fixing Broken UTF8 encoding](http://stackoverflow.com/questions/1344692/i-need-help-fixing-broken-utf8-encoding) — Pekka, Jun 14 '11 at 11:07

score 0 · Answer 1 · answered Mar 05 '14 at 19:20

I just went through this. The biggest part of my solution was exporting the database to .csv and Find / Replace the characters in question. The character at issue may look like a space, but copy it directly from the cell as your Find parameter.

Once this is done - and missing this is what took me all morning:

Save the file as CSV ( MS-DOS )

Excellent post on the issue

Source of MS-DOS idea

Liv · Answer 2 · 2011-06-14T11:16:22.017

If you tried to store ISO-8859-1 characters in the a database which is set to UTF-8 you just managed to corrupt your "special characters" -- as MySQL would retrieve the bytes from the database and try to assemble them as UTF-8 rather than ISO-8859-1. The only way to read the data correctly is to use a script which does something like:

ResultSet rs = ...
byte[] b = rs.getBytes( COLUMN_NAME );
String s = new String( b, "ISO-8859-1" );

This would ensure you get the bytes (which came from a ISO-8859-1 string from what you said) and then you can assemble them back to ISO-8859-1 string. The other problem as well -- what do you use to "view" the strings in the database -- is it not the case that your console doesn't have the right charset to display those characters rather than the characters being stored wrongly?

NOTE: Updated the above after the last comment

the database is set to utf-8 the strings stored in the db are iso-8859-1 — niklas, Jun 14 '11 at 11:12
I've just updated the code -- it's just a matter of using them ISO-8859-1 when re-assembling the bytes into a String. — Liv, Jun 14 '11 at 11:16

Problem with charset

2 Answers2