0

I have one text file generated using SQL Server 2005. While I am importing the text file into one of my PostgreSQL table using "copy" it is giving me the following error:

ERROR: invalid byte sequence for encoding "UTF8": 0xff

Can any one tell me what i need to do get the data from SQL Server 2005 to PostgreSQL?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
kallem
  • 741
  • 2
  • 8
  • 13
  • What type of column has the nul byte in it? What encoding is SQLServer dumping the text file in? Can you explicitly tell it to use UTF-8? – mu is too short Feb 12 '11 at 04:54
  • SQLServer dumping the text file in character encoding and i am not sure about explicit usage of UTF-8 and character varying column has the nul byte in it – kallem Feb 12 '11 at 05:16
  • Sounds like you have an encoding problem, I'd guess that SQLServer is using Windows-1252 (look at one of the 0xff characters, if it is "ÿ" then you have Windows-1252) but you need the character data in UTF-8. I'm not a Windows guy so I'm not sure how you would fix the encoding but this can't be an uncommon problem so hopefully someone will be along that knows what to do. – mu is too short Feb 12 '11 at 06:20
  • Maybe this will help: http://www.sqlservercentral.com/Forums/Topic636752-149-1.aspx – mu is too short Feb 12 '11 at 06:27
  • Thank You mu for the feedback and valuable time is there any one help out – kallem Feb 12 '11 at 06:50

3 Answers3

4

I had exactly the same situation (except for SQL Server 2008 instead of 2005). When I was exporting the file using 'SQL Server Import and Export Wizard' and picked Flat File as Destination, the 'Code page' parameter defaulted to '1252 (ANSI - Latin I)'. Thus, when running copy command in postgreSQL I used set client_encoding to 'LATIN1'; copy tablename FROM 'path/to/file.csv' DELIMITERS '|' CSV; - and it executed successfully.

1

If the file is indeeded in Windows-1252 encoding then you can switch the encoding by using set client_encoding=windows_1252 before running the copy command.

Check out the manual for a list of available encodings:
http://www.postgresql.org/docs/9.0/static/multibyte.html

  • Hi,Thank You for the reply but how to i am not sure about the file whether it is windows-1252 or some thing else do u have any idea how to find that – kallem Feb 14 '11 at 05:25
1

SQL Server text output (BCP files?) have in the past been written as UTF-16, which is a Unicode encoding that PostgreSQL doesn't support. UTF-16 files start with 0xff 0xfe (or 0xfe 0xff) so that would be one reason for getting a complaint about that particular byte value first.

On Linux or similar I'd suggest using the "recode" or "iconv" utilities to convert from UTF-16 to UTF-8, which is PostgreSQL's preferred all-Unicode encoding. One recommendation for performing that task on Windows is just to get a Windows version of recode: UTF-16 to UTF-8 conversion (for scripting in Windows)

Community
  • 1
  • 1
araqnid
  • 127,052
  • 24
  • 157
  • 134