No special syntax is required so long as:
- Your
server_encoding
includes those characters (if it's utf-8
it does);
- Your
client_encoding
includes those characters;
- Your
client_encoding
correctly matches the encoding of the bytes you're actually sending
The latter is the one that often trips people up. They think they can just change client_encoding
with a SET client_encoding
statement and it'll do some kind of magical conversion. That is not the case. client_encoding
tells PostgreSQL "this is the encoding of the data you will receive from the client, and the encoding that the client expects to receive from you".
Setting client_encoding
to utf-8
doesn't make the client actually send UTF-8. That depends on the client. Nor do you have to send utf-8; that string can also be represented in iso-8859-2
, iso-8859-4
and iso-8859-10
among other encodings.
What's crucial is that you tell the server the encoding of the data you're sending. As it happens that string is the same in all three of the encodings mentioned, with the ę
encoded as 0xae
... but in utf-8 that'd be the two bytes 0xc4 0x99
. If you send utf-8 to the server and tell it that it's iso-8859-2
the server can't tell you're wrong and will interpret it as Ä
in iso-8859-2.
So... really, it depends on things like the system's default encoding, the encoding of any files/streams you're reading data from, etc. You have two options:
Set client_encoding
appropriately for the data you're working with and the default display locale of the system. This is easiest for simple cases, but harder when dealing with multiple different encodings in input or output.
Set client_encoding
to utf-8 (or the same as server_encoding
) and make sure that you always convert all input data into the encoding you set client_encoding
to before sending it. You must also convert all data you receive from Pg back.