1

I have a postgres 9.2 database, which encoding is UTF-8. I have an application(written in java) to update this database, reading .sql files and executing them in the database. But i found a problem: In one of those .sql files, i have the following instruction:

insert into usuario(nome)
values('Usuário Padrão');

After executing this, when i go to the table data, what was inserted was this: "Usuário Padrão"

If i execute this command directly from pgAdmin, it creates correctly. So i don't know if it's a problem in database, or in the program that executes the scripts.

---EDIT---

Here is how i get a JDBC connection:

public static Connection getConnection() throws SQLException{
    Connection connection;
    String url="jdbc:postgresql://"+servidor+":"+porta+"/"+nomeBanco;
    Properties props = new Properties();  
    props.put("user", usuario);  
    props.put("password", senha);
    connection=DriverManager.getConnection(url,props);
    connection.setAutoCommit(false);
    return connection;
}

And here is the code i use to read the file, but this looks correct, because if i print the String read from the file, it shows the correct String.

public static String lerArquivo(File arquivo){
    StringBuilder conteudo=new StringBuilder();
    BufferedReader br = null;
    try {
        br=new BufferedReader(new FileReader(arquivo));
        String linha;
        while((linha=br.readLine())!=null){
            conteudo.append(linha).append("\n");
        }
    } catch (IOException e) {
        FrameErroBasico f=new FrameErroBasico(null, true);
        f.setText("Erro ao ler arquivo.",e);
        f.setVisible(true);
    }finally{
        try{br.close();}catch(Exception e){}
    }
    return conteudo.toString();
}
Mateus Viccari
  • 7,389
  • 14
  • 65
  • 101

3 Answers3

3

This is most likely the problematic line:

    br=new BufferedReader(new InputStreamReader(new FileInputStream(arquivo), "UTF-8"));

(looks like my crystal ball is still working well!)

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
  • Yeah that was the problem! Though there was a scratch in the crystal ball. If anyone is having the same issue, the right way to solve it is like this: br=new BufferedReader(new InputStreamReader(new FileInputStream(arquivo),"UTF-8")); Thank you! – Mateus Viccari Jul 24 '13 at 18:31
  • @MateusViccari - ooops, yeah, swapped my streams, should have written it a little more carefully. the issue wasn't the ball, just my fat fingers. – jtahlborn Jul 24 '13 at 19:00
1

To be sure I'd need to see the code that reads the SQL file in, but (as pointed out by jtahlborn) I'd say you're reading the file with an encoding other than the encoding it really has.

PgJDBC uses Unicode on the Java side and takes care of client/server encoding differences by always communicating with the server in utf-8, letting the server do any required encoding conversions. So unless you set client_encoding via your PgJDBC connection - something PgJDBC tries to detect and warn you about - the problem won't be on the PostgreSQL/PgJDBC side, it'll be with misreading the file.

Specifically, it looks like the file is utf-8 encoded, but you are reading it in as if it was latin-1 (ISO-8859-1) encoded. Witness this simple demo in Python to replicate the results you are getting by converting a native Unicode string to utf-8 then decoding it as if it was latin-1:

>>> print u'Usuário Padrão'.encode("utf-8").decode("latin-1");
Usuário Padrão

Your application most likely reads the file into a String in a manner that performs inappropriate text encoding conversions from the file encoding to the unicode text that Java works with internally. There is no reliable way to "auto-detect" the encoding of a file, so you must specify the text encoding of the input when reading a file. Java typically defaults to the system encoding, but that can be overridden. If you know the encoding of the file, you should explicitly pass it when opening the file for reading

You haven't shown the code that reads the file so it's hard to be more specific, but this is really a Java side issue not PostgreSQL-side. If you System.out.println your SQL file from Java you'll see that it already mangled in your Java string before you send it to the database server.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • very verbose description which is basically the same as my first comment. unfortunately, the OP has not shown the relevant code. – jtahlborn Jul 24 '13 at 02:57
  • @jtahlborn Explaining things to people who don't understand them is a significant part of what this site is about, no? – Craig Ringer Jul 24 '13 at 02:58
  • indeed. my comment wasn't necessarily a criticism, however, it's all just a shot in the dark without the relevant code from the OP. – jtahlborn Jul 24 '13 at 03:04
  • i will put the code i use to read the file. but i've already tried to do a System.out.println() on the String i execute via preparedStatement. And it shows in the console the correct String, without the strange carracters. – Mateus Viccari Jul 24 '13 at 11:11
  • @MateusViccari That would actually make sense if your console is in the same encoding. What if you print the *bytes* of the string? Print the contents of the byte array returned by `thestring.getBytes("utf-8")` and post it here. – Craig Ringer Jul 25 '13 at 01:01
0

As jtahlborn said, the right way to read the file is like this:

br=new BufferedReader(new InputStreamReader(new FileInputStream(arquivo),"UTF-8"));

That was my problem, doing like this, it works like a charm.

Mateus Viccari
  • 7,389
  • 14
  • 65
  • 101