SQL Server generated "Text File" data importing in to postgressql using copy command

Question

I have one text file generated using SQL Server 2005. While I am importing the text file into one of my PostgreSQL table using "copy" it is giving me the following error:

ERROR: invalid byte sequence for encoding "UTF8": 0xff

Can any one tell me what i need to do get the data from SQL Server 2005 to PostgreSQL?

What type of column has the nul byte in it? What encoding is SQLServer dumping the text file in? Can you explicitly tell it to use UTF-8? — mu is too short, Feb 12 '11 at 04:54
SQLServer dumping the text file in character encoding and i am not sure about explicit usage of UTF-8 and character varying column has the nul byte in it — kallem, Feb 12 '11 at 05:16
Sounds like you have an encoding problem, I'd guess that SQLServer is using Windows-1252 (look at one of the 0xff characters, if it is "ÿ" then you have Windows-1252) but you need the character data in UTF-8. I'm not a Windows guy so I'm not sure how you would fix the encoding but this can't be an uncommon problem so hopefully someone will be along that knows what to do. — mu is too short, Feb 12 '11 at 06:20
Maybe this will help: http://www.sqlservercentral.com/Forums/Topic636752-149-1.aspx — mu is too short, Feb 12 '11 at 06:27
Thank You mu for the feedback and valuable time is there any one help out — kallem, Feb 12 '11 at 06:50

score 4 · Answer 1 · answered Nov 29 '12 at 20:20

I had exactly the same situation (except for SQL Server 2008 instead of 2005). When I was exporting the file using 'SQL Server Import and Export Wizard' and picked Flat File as Destination, the 'Code page' parameter defaulted to '1252 (ANSI - Latin I)'. Thus, when running copy command in postgreSQL I used set client_encoding to 'LATIN1'; copy tablename FROM 'path/to/file.csv' DELIMITERS '|' CSV; - and it executed successfully.

score 1 · Answer 2 · answered Feb 12 '11 at 09:03

1

If the file is indeeded in Windows-1252 encoding then you can switch the encoding by using set client_encoding=windows_1252 before running the copy command.

Check out the manual for a list of available encodings:
http://www.postgresql.org/docs/9.0/static/multibyte.html

answered Feb 12 '11 at 09:03

Hi,Thank You for the reply but how to i am not sure about the file whether it is windows-1252 or some thing else do u have any idea how to find that – kallem Feb 14 '11 at 05:25

score 1 · Answer 3 · edited May 23 '17 at 11:44

SQL Server text output (BCP files?) have in the past been written as UTF-16, which is a Unicode encoding that PostgreSQL doesn't support. UTF-16 files start with 0xff 0xfe (or 0xfe 0xff) so that would be one reason for getting a complaint about that particular byte value first.

On Linux or similar I'd suggest using the "recode" or "iconv" utilities to convert from UTF-16 to UTF-8, which is PostgreSQL's preferred all-Unicode encoding. One recommendation for performing that task on Windows is just to get a Windows version of recode: UTF-16 to UTF-8 conversion (for scripting in Windows)

SQL Server generated "Text File" data importing in to postgressql using copy command

3 Answers3