SELECT FROM OPENROWSET( BULK...) changing special characters

Question

I've been facing an issue for a few hours, and I can't seem to get my head around this one.

So I have a SQL Server database 2008R2, Collation SQL_Latin1_General_CP1_CI_AS. Inside there is a table, with a field named incoming_name. The collation of this field is also SQL_Latin1_General_CP1_CI_AS, and it is a NVARCHAR(255).

I have a .csv file with around 123000 rows. It's a basic csv, no double quotes around text, but no comma inside the fields, so when I run a manual import into my database it works fine. The incoming_name field contains all kind of text, but never longer than 255 characters. And in a few lines there are french accents (like 'Ch*â*teau d'Agassac').

Now I try to use the code

select 
    test_file.[INCOMING_NAME] COLLATE SQL_Latin1_General_CP1_CI_AS
    as [INCOMING_NAME]
    , test_file.[PRODUCT_CODE] AS [PRODUCT_CODE]
FROM
                OPENROWSET(
                BULK 'INSERT PATH OF THE .CSV HERE',
                FORMATFILE = 'INSERT PATH OF THE FORMAT FILE HERE',
                FIRSTROW = 2
                ) AS test_file

With the format file

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <RECORD>
        <FIELD ID="4" xsi:type="CharTerm" TERMINATOR=',' MAX_LENGTH="255" COLLATION="SQL_LATIN1_GENERAL_CP1_CI_AS" />
        <FIELD ID="29" xsi:type="CharTerm" TERMINATOR='\r\n' />
    </RECORD>
    <ROW>
        <COLUMN SOURCE="4" NAME="INCOMING_NAME" xsi:type="SQLNVARCHAR"/>
        <COLUMN SOURCE="29" NAME="PRODUCT_CODE" xsi:type="SQLNVARCHAR"/>    
    </ROW>

The import works fine, and I get all my data, with the right values in the right fields, except for the accents...

For example when I add where test_file.incoming_name like '%agassac%' at the end of my query, I get a result like 'ChÃ¢teau d'Agassac' instead of the original data 'Château d'Agassac' in my database.

What I don't understand is that I feel like at every step of the process, I did pick an accent sensitive collation, with a unicode datatype (NVARCHAR), so I really don't understand why the import doesn't pick the accents.

Thanks for reading this long question,

John.

EDIT: Ok, it looks like the .csv file I want to import is encoded with utf-8, and SQL Server 2008 doesn't want to support utf-8 import. Now I have no idea what to do. Any idea welcome...

Another possible option here https://stackoverflow.com/questions/5498033/how-to-write-utf-8-characters-using-bulk-insert-in-sql-server/6840439#6840439 — Nick.Mc, Apr 18 '22 at 22:09

score 0 · Answer 1 · answered Jul 22 '13 at 19:07

0

I think adding widenative as DATAFILETYPE should resolve the issue. Please refer to this link for further details: http://msdn.microsoft.com/en-us/library/ms189941.aspx

answered Jul 22 '13 at 19:07

Sonam

3,406
1
12
24

Hi Sonam, thanks for your answer. The issue is, I tried to add DataFileType = 'widenative', but it looks like it is only allowed with the BULK INSERT TableName FROM 'csv path' WITH (options...) syntax or with the bcp command. I'll try to change my code to use BULK INSERT FROM instead of OPENROWSET. – Jonathan P. Jul 23 '13 at 09:25

SELECT FROM OPENROWSET( BULK...) changing special characters

1 Answers1