2

I have a sql statemwent, which contain a unicode specific sign. The unicode sign is ę in the polish word Przesunięcie. Please look at the following SQL INSERT Statement:

INSERT INTO res_bundle_props (res_bundle_id, value, name) 
VALUES(2, 'Przesunięcie przystanku', 'category.test');

I work with the Postgres Database. In which way can i insert the polish word with the unicode letter?

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • 1
    What is the template you are using for this table? Does it support the character set? – Burhan Khalid Jul 01 '13 at 10:26
  • have a look at this http://stackoverflow.com/questions/11429655/can-not-insert-german-characters-in-postgres – Rohan Jul 01 '13 at 10:28
  • 1
    should work just fine. Are you sure the statement parameters are *received* correctly (from wherever you get them)? –  Jul 01 '13 at 10:48
  • See here: http://sqlfiddle.com/#!12/8091a/1 –  Jul 01 '13 at 11:14

2 Answers2

0

Find what are the server and client encodings:

show server_encoding;
 server_encoding 
-----------------
 UTF8

show client_encoding;
 client_encoding 
-----------------
 UTF8

Then set the client to the same encoding as the server:

set client_encoding = 'UTF8';
SET
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • That's overly simplistic and often incorrect. Consider the case where the server is using utf-8 but the client is `psql` in an `iso-8859-8` encoded terminal. `ש` is a perfectly legal character in that terminal, and is encoded as `0xf9`. If you lie to Pg and say you're sending UTF-8 when you're really sending iso-8859-8, Pg will try to decode the client's `\xf9` as utf-8 and fail with an error complaining about bad Unicode input. It's rare that you need to change `client_encoding` directly; more often you just have to make sure it matches your inputs. – Craig Ringer Jul 01 '13 at 11:04
0

No special syntax is required so long as:

  • Your server_encoding includes those characters (if it's utf-8 it does);
  • Your client_encoding includes those characters;
  • Your client_encoding correctly matches the encoding of the bytes you're actually sending

The latter is the one that often trips people up. They think they can just change client_encoding with a SET client_encoding statement and it'll do some kind of magical conversion. That is not the case. client_encoding tells PostgreSQL "this is the encoding of the data you will receive from the client, and the encoding that the client expects to receive from you".

Setting client_encoding to utf-8 doesn't make the client actually send UTF-8. That depends on the client. Nor do you have to send utf-8; that string can also be represented in iso-8859-2, iso-8859-4 and iso-8859-10 among other encodings.

What's crucial is that you tell the server the encoding of the data you're sending. As it happens that string is the same in all three of the encodings mentioned, with the ę encoded as 0xae... but in utf-8 that'd be the two bytes 0xc4 0x99. If you send utf-8 to the server and tell it that it's iso-8859-2 the server can't tell you're wrong and will interpret it as Ä in iso-8859-2.

So... really, it depends on things like the system's default encoding, the encoding of any files/streams you're reading data from, etc. You have two options:

  • Set client_encoding appropriately for the data you're working with and the default display locale of the system. This is easiest for simple cases, but harder when dealing with multiple different encodings in input or output.

  • Set client_encoding to utf-8 (or the same as server_encoding) and make sure that you always convert all input data into the encoding you set client_encoding to before sending it. You must also convert all data you receive from Pg back.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778