4

So I'm trying to input blog comments into a database for an NLP experiment but I'm having some issues: I'm using prepare statements on the inserts but all the single quotes are turning into question marks.

I'm testing on OS X and don't know the character encoding: I assume it's default isn_swedish, etc, but after a few hours of scattered Googling I haven't been able to figure out how to determine it. I'm submitting something like "I didn't say that" as a param to

PreparedStatement statement = connect.prepareStatement("INSERT IGNORE INTO bwog.article (article_id, date, title, content, url) VALUES (?, ?, ?, ?, ?)");
...
...
String s = "I didn't say that"; //not literal string, but printlns like this
statment.setString(4, s);

and it's turning into "I didn?t say that" in the database after execution and all that.

I assume it's some kind of assumption issue where I didn't know about or forgot to fulfill some precondition.

SOLUTION: It was character encoding. Database and tables were in UTF-8 but command line connection was in latin1 for all the "character_set%" variables, so even though the data was fine it appeared garbled.

Patrick McGuire
  • 329
  • 1
  • 12
  • What is the result of `Charset.defaultCharset` on your system. And what is the character encoding of your database table? – Perception Mar 07 '12 at 05:19
  • This is definitely a character encoding issue. The problem lies with either your database driver does not support it either in your database or your schema OR you are using 2 different char sets in your code. http://stackoverflow.com/questions/4724299/java-preparedstatement-setstring-changes-characters See if this link is of any help to you. – S.P. Mar 07 '12 at 06:55
  • 2
    You should provide the answer you found for this question and 'accept' it. – Nathaniel Ford May 15 '12 at 01:53
  • Nice work Patrick McGuire but S.P. is right. Can you please post an answer to the question yourself and then accept that answer so that we can close this question? Also, you need to accept answers to previous questions if they fix your problem. – Zecas May 24 '12 at 10:35

1 Answers1

0

In order to remove this from the "Unanswered" filter...

Prediction: Your problem is character encoding. I bet your database and tables are in UTF-8 but your command line connection is in latin1 for all the "character_set%" variables, so even though the data is fine it appears garbled.

DreadPirateShawn
  • 8,164
  • 4
  • 49
  • 71