1

I have a Persian CSV file and I need to read that with SQL bulk into the SQL server:

I wrote this bulk:

BULK INSERT TEMP
FROM 'D:\t1.csv'
WITH(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
CODEPAGE = '1256'
);

but that can not read UTF-8 encoding and read ی character as ? character.

How can I write that?

wovano
  • 4,543
  • 5
  • 22
  • 49
behzad razzaqi
  • 1,503
  • 2
  • 18
  • 33

4 Answers4

1

1. go to the BULK INSERT documentation on MSDN

2. find the section on the CODEPAGE

3. see the note that says:
SQL Server does not support code page 65001 (UTF-8 encoding).

4. Research further and find the Use Unicode Character Format to Import or Export Data (SQL Server) and see if that helps

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • The documentation currently states "** Important ** Versions prior to SQL Server 2016 (13.x) do not support code page 65001 (UTF-8 encoding)." – user1069816 Oct 12 '21 at 15:59
  • 2
    @user1069816: true - but in August 2015, when I wrote this answer, there was no SQL Server 2016 yet ..... and as far as I know, UTF-8 support was really only added in earnest in SQL Server **2019** just recently – marc_s Oct 12 '21 at 16:01
0

This problem is still there in SQL server 2017, see here and here.

If your import is just an occasional exercise, i.e. if it's OK to import not using a script at all, what worked for me is simply importing the csv using Tasks -> Import -> Flat file.

I'm adding this here because this page is high up when you Google 'SQL Server does not support code page 65001'. Hope it helps some.

RolfBly
  • 3,612
  • 5
  • 32
  • 46
0

I went through the documenation @marc_s linked to, and found the usage of DATAFILETYPE = widechar.
I then went ahead and tried it with my UTF-8 csv file, but it didn't work, giving me the error:

[...] the data file does not have a Unicode signature

I then re-saved my csv file with Notepad's Unicode format, retried the import, and voila, success.

  • Make sure all commas and line-breaks are escaped (see here how to save a valid csv).

My full script (I'm using SQL Server 2017):

BULK INSERT [my_table]
FROM 'C:\path\to\file.csv'
WITH
(
    FORMAT = 'CSV', 
    FIRSTROW = 2,           -- if you have a title row, the first data row is 2nd
    FIELDTERMINATOR = ',', 
    KEEPIDENTITY,           -- remove it if you don't want identity to be kept
    ROWTERMINATOR = '\n',   
    DATAFILETYPE = 'widechar', 
    ERRORFILE = 'C:\path\to\file_err.txt',
    KEEPNULLS,
    TABLOCK
)

Notes:

  • Make sure your date fields are in valid sql format.
  • Regarding KEEPNULS, read this question (e.g, if you have NULLs in your file, replace them with an empty string).
OfirD
  • 9,442
  • 5
  • 47
  • 90
0

In addition to the now deprecated or obsolete earlier answers by others I want to point out that a of today in May 2022, with Release Version 15.0.2080.9 (SQL Server 2019), this works flawlessly for UTF-8.

  • Create a UTF-8 encoded file (I use with BOM)

then

BULK INSERT #tempTable1
FROM 'C:\....\file.csv' WITH (
CODEPAGE = '65001',
FIRSTROW = 2, --skip the first line
FIELDTERMINATOR = ';', 
ROWTERMINATOR = '\n')
GO

works flawlessly for me, with many french and other characters.

Marcel
  • 15,039
  • 20
  • 92
  • 150