0

I have a stored procedure that bulk imports a text file and inserts it into my database.

CREATE TABLE DBO.TEMP_STORE
(
    ID NVARCHAR(MAX),
    [MONTH] NVARCHAR(MAX),
    [YEAR] NVARCHAR(MAX),
    STORE NVARCHAR(MAX),
    SUBMITTAL NVARCHAR(MAX),
    ENTITY NVARCHAR(MAX),
    SUBMITTAL_TYPE NVARCHAR(MAX),
    iCOUNT NVARCHAR(MAX),
    STATE NVARCHAR(MAX),
    COUNTRY NVARCHAR(MAX),
    REGION NVARCHAR(MAX),
    GLOBAL_REGION NVARCHAR(MAX),
    LOCAL_CURRENCY NVARCHAR(MAX)
)

--SELECT * FROM DBO.TEMP_STORE
--prepare bulk insert query to load data from file to temp table
SET @SQL='BULK INSERT DBO.TEMP_STORE FROM '''+ @FilePath+''''
SET @SQL=@SQL+'WITH ('
SET @SQL=@SQL+'DATAFILETYPE = ''char'',' 
SET @SQL=@SQL+'FIELDTERMINATOR = ''|'','
--SET @SQL=@SQL+'ROWTERMINATOR = ''\n'','
SET @SQL=@SQL+'ROWTERMINATOR = ''' + nchar(10) + ''','
SET @SQL=@SQL+'FIRSTROW =2)'

--print @SQL
EXEC (@SQL)

For instance, importing Update ROLLING_CONE_SHOP_DETAIL set SHOP_STATE = 'São Paulo' results in storing the data as'S+úo Paulo' when I select the same row that was imported. I can update the row and explicitly fix the fix the issue:

UPDATE TEMP_STORE 
SET STATE = 'São Paulo' 
WHERE STATE= 'S+úo Paulo'

But when I do my Bulk insert, it does not retain the special character. I have confirmed the text file I am importing is saved at utf-8 and contains the correct character.

How can I make sure my bulk inset retains the special character properly?

EDIT: This question differs from the "duplicate" question in that I am not looking to import data as UTF-8, I am looking to import data that retains my special characters.

Perhaps removing utf-8 from title would avoid the confusion

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
ninjasense
  • 13,756
  • 19
  • 75
  • 92
  • 2
    SQL Server 2008 doesn't support UTF-8, I'm afraid. The first version that will is SQL Server 2019. You won't be able to use `BULK INSERT` to import a UTF-8 file reliably. – Thom A Jan 25 '19 at 15:59
  • 1
    You can use a powershell command like `gc data.txt | out-file data2.txt -Encoding Unicode` to re-endode the file. – David Browne - Microsoft Jan 25 '19 at 16:18
  • @Larnu how come I can store the information correctly via an update but not a bulk insert? – ninjasense Jan 25 '19 at 16:24
  • Because you're not using UTF-8 when you write that SQL Statement. – Thom A Jan 25 '19 at 16:32
  • @Larnu if I can bulk inset in the same encoding as the SQL statement, shouldn't it work regardless of the file encoding? – ninjasense Jan 25 '19 at 17:01
  • `UPDATE` isn't a `BULK INSERT`. They are 2 completely different things. `BULK INSERT` is reading the file from disc, which is in UTF-8 (and hence why the character is lost). When you write the `UPDATE` statement the **literal** string is interpreted in the collation that your database/column is using, and so the character isn't "lost". – Thom A Jan 25 '19 at 17:04
  • @Larnu I figured it out, all that needed to be done was to change the encoding of the file from utf-8 to ANSI. The characters persisted through the Bulk Insert – ninjasense Jan 25 '19 at 17:10
  • 1
    Yes, that's why @DavidBrowne-Microsoft suggested using powershell to change the encode.. – Thom A Jan 25 '19 at 17:11

0 Answers0