0

we built a java ee web project and use jdbc for storing our data. The problem is that German 'Umlaute' like äöü are in use and properly stored in the mysql database. We don't know why, but in the browser those characters are broken, displaying weird stuff like

ö�

instead. I've already tried setting the encoding of the jdbc connection like described in this question:

JDBC character encoding

And the encoding of the html page is correctly set:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

Any ideas how to fix that?


Update

connection.prepareStatement("SET CHARACTER SET utf8").execute();

won't make umlauts work. changing the meta-tag to

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

won't change anything, too

Community
  • 1
  • 1
f4lco
  • 3,728
  • 5
  • 28
  • 53

1 Answers1

1

"We don't know why, but in the browser those characters are broken"

Well, that's the first thing to find out. You should trace your data at every stage:

  • As you fetch it out of the database (with logging)
  • When you inject it into the page (with logging)
  • On the wire (via Wireshark)

When you log, don't just log the strings: log the Unicode characters that make up the strings, as integers. Just cast each character in the string to an integer and log it. It's primitive, but it'll tell you what you need to know.

When you look on the wire, of course, you'll be seeing bytes rather than characters as such. You should work out what bytes you expect for your chosen encoding, and check those against what's actually coming across the network.

You've specified the encoding in the HTML - but have you told whatever's generating your page that you want it in ISO Latin 1? That's likely to be responsible for both setting the content-type header and performing the actual conversion from text to bytes.

Additionally, is there any reason why you're using ISO Latin 1 instead of UTF-8? Why would you deliberately restrict yourself like that? (ISO Latin 1 can only handle the first 256 characters of Unicode, instead of the full range of Unicode characters. UTF-8 can handle everything, and is just as efficient for ASCII.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Well, then I will have a try with logging. I'm using standard servlets and a few tags. – f4lco Jan 27 '11 at 17:41
  • As this works out, those characters are broken directly after retrieving them from the database. – f4lco Jan 27 '11 at 17:46
  • @phineas: Okay, so your next step is to write a *console* application which tries to fetch the data from the database and log it in the same way. That way you don't need to mess around with getting the webapp part right - it's much easier to tinker with a console app, IMO. – Jon Skeet Jan 27 '11 at 17:51
  • I did some logging, and for instance, the German 'ö' is splitted into two characters, 'Ã' (int 195) and '¶' (int: 182). – f4lco Jan 27 '11 at 17:52
  • 1
    @phineas : How did you "I did some logging" ? – Gilles Quénot May 29 '11 at 16:09