Related to this question: "Fix" String encoding in Java
My project encoding is UTF-8.
I need to make a query to a DB that uses a particular varchar encoding (apparently EUC-KR).
I take the input as UTF-8, and I want to make the DB query with the EUC-KR encoded version of that string.
First of all, I can select and display the encoded strings using the following:
ResultSet rs = stmt.executeQuery("SELECT name FROM mytable");
while(rs.next())
System.out.println(new String(rs.getBytes(1), "EUC-KR"));
I want to do something like:
PreparedStatement ps = conn.prepareStatement("SELECT * FROM MYTABLE WHERE NAME=?");
ps.setString(1,input);
ResultSet rs = ps.executeQuery();
Which obviously won't work, because my Java program is not using the same encoding as the DB. So, I've tried replacing the middle line with each of the following, to no avail:
ps.setString(1,new String(input.getBytes("EUC-KR")));
ps.setString(1,new String(input.getBytes("EUC-KR"), "EUC-KR"));
ps.setString(1,new String(input.getBytes("UTF-8"), "EUC-KR"));
ps.setString(1,new String(input.getBytes("EUC-KR"), "UTF-8"));
I am using Oracle 10g 10.1.0
More details of my attempts follow:
What does seem to work is saving the name from the first query into a string without any other manipulation, and passing that back as a parameter. It matches itself.
That is,
ResultSet rs = stmt.executeQuery("SELECT name FROM mytable");
rs.next();
String myString = rs.getString(1);
PreparedStatement ps = conn.prepareStatement("SELECT * FROM mytable WHERE name=?");
ps.setString(1, myString);
rs = ps.executeQuery();
... will result with the 1 correct entry in rs
. Great, so now I just need to convert my input to whatever format that thing seems to be in.
However, nothing I have tried seems to match the "correct" string when I try reading their bytes using
byte[] mybytearray = myString.getBytes();
for(byte b : mybytearray)
System.out.print(b+" ");
In other words, I can turn °í»ê
into 고산
but I can't seem to turn 고산
into °í»ê
.
The byte array given by
rs.getBytes(1)
is different from the byte array given by any of the following:
rs.getString(1).getBytes()
rs.getString(1).getBytes("UTF8")
rs.getString(1).getBytes("EUC-KR")
Unhappiness: it turns out that for my DB, NLS_CHARACTERSET = US7ASCII
Which means that what I'm trying to do is unsupported. Thanks for playing everyone :(