0

I maintain a piece of software that runs as a Servlet and can make use of MySQL, Oracle or SQL Server as the DB backend - depending on what the Customer wants to use.

Everything works perfectly with MySQL and Oracle, and SQL Server works great too, except I cannot insert/update Unicode sequences into the database.

I can do a manual insert in SQL Server Management Studio of a unicode sequence like this

INSERT INTO mytable (msg) VALUES (N'Modern Standard Hindi (मानक हिन्दी), is a standardised and Sanskritised register of the Hindustani language.')

This data is output in my software correctly, so this verifies that the database and the web front end can both handle unicode no problem.

And here's my connection string

jdbc:jtds:sqlserver://<server ip>:1433/MyDb

As I said, Oracle and MySQL work perfectly using this setup. What's different about SQL Server?

Note: I also tried the official Microsoft-provided JDBC driver with exactly the same results.

If it makes a difference, I'm using JPA Repositories to do my DB interactions. The whole webapp is also set up as a SpringMVC application.

Edit: I also tried adding useUnicode=true;characterEncoding=UTF-8 to the end of my connection string with the same results

Will Warren
  • 1,294
  • 15
  • 33

2 Answers2

2

You may need to specify in your connection string that you're using Unicode, and also what encoding you're using. Maybe something like this?

jdbc:jtds:sqlserver://<server_ip>:1433;databaseName=MyDb;useUnicode=true;characterEncoding=UTF-8
Breandán Dalton
  • 1,619
  • 15
  • 24
  • Thanks, but unfortunately I already tried that to no avail. The results are identical. Inserting "فُصْحَى" results in "ÙÙØµÙØ­ÙÙ" being saved in the DB still. – Will Warren Sep 09 '14 at 14:44
  • You added 7 unicode codepoints 0641, 064f, 0635, 0652, 062d, 064e, 0649 (encoded in UTF8 as d981, d98f, d8b5, d992, d8ad,d98e, d989) and got, I surmise, the 14 codepoints 00d9 0081, 00d9, 008f, 00d8, 00b5, 00d9, 0092, 00d8, 00ad, 00d9, 008e, 00d9, 0089 (some of them aren't printable which is why it looks like only 8 characters). In other words, the characters were correctly converted into UTF-8 first, then another bit of software came along and treated each byte of the encoding as an 8-bit ISO 8859-1 character, and converted them into unicode codepoints (or utf-16 or ucs2 encodings of them). – Breandán Dalton Sep 09 '14 at 15:39
  • Note this question: http://stackoverflow.com/questions/12512687/sql-server-utf8-howto which implies that if you are using SQL server 2008, you are up a creek. – Breandán Dalton Sep 09 '14 at 15:46
  • Thanks for the insights. I'm using SQL Server 2012 (thank god) I've added a bunch of debugging on hibernate to see if it's really breaking where I think its breaking. I flipped back to my Oracle DB to eliminate a few more code paths. I'm zeroing in! – Will Warren Sep 09 '14 at 15:51
2

So there was one thing I left off my original question.

I had the system set up with Spring Security and a CharacterEncodingFilter in the filter chain to force UTF-8 in all requests and responses, but I had it set up in the afterSpringSecurityChain section of my web security initializer.

I moved the filter into the beforeSpringSecurityChain method and boom - everything UTF-8 works perfectly with all DB vendors.

I ended up not adding any params to my connection string or anything either. It was literally just the filter that I changed.

Will Warren
  • 1,294
  • 15
  • 33