0

I'm trying to import a .CSV file into a SQL Server table using a BULK INSERT with a format file. I can get it to import, but any Latin characters are imported as strange characters. I take a lot of pride in completing this personal project on my own, but I've hit a point where I just need help. I can change the characters by doing some messy UPDATE and REPLACE statements after importing the data, but I really want to be able to import the Latin characters as they appear in the .CSV file in one step.

Here's the database and table that I created:

CREATE DATABASE Test;

CREATE TABLE dbo.rawData
    ([Position] nvarchar(500) NULL,
    [Const] nvarchar(500) NULL,
    [Created] nvarchar(500) NULL,
    [Modified] nvarchar(500) NULL,
    [Description] nvarchar(500) NULL,
    [Title] nvarchar(500) NOT NULL,
    [TitleType] nvarchar(500) NULL,
    [Directors] nvarchar(500) NULL,
    [YouRated] nvarchar(500) NULL,
    [IMDbRating] nvarchar(500) NULL,
    [Runtime] nvarchar(500) NULL,
    [Year] nvarchar(500) NULL,
    [Genres] nvarchar(500) NULL,
    [NumVotes] nvarchar(500) NULL,
    [ReleaseDate] nvarchar(500) NULL, 
    [URL] nvarchar(500) NULL,
    )
GO

And here is some of the data that I'm working with taken from a .CSV file (saved as ratings.csv). I use Notepad++ and it is encoded in UTF-8. Notice how the last row for "Dallas Buyers Club" has a director with a Latin character in his name:

"position","const","created","modified","description","Title","Title type","Directors","You rated","IMDb Rating","Runtime (mins)","Year","Genres","Num. Votes","Release Date (month/day/year)","URL"
"1","tt0437863","Tue Feb 16 00:00:00 2016","","","The Benchwarmers","Feature Film","Dennis Dugan","5","5.6","80","2006","comedy, romance, sport","39413","2006-04-07","http://www.imdb.com/title/tt0437863/"
"2","tt0085334","Tue Feb 16 00:00:00 2016","","","A Christmas Story","Feature Film","Bob Clark","6","8.1","94","1983","comedy, family","103770","1983-11-18","http://www.imdb.com/title/tt0085334/"
"3","tt2403029","Tue Feb 16 00:00:00 2016","","","The Starving Games","Feature Film","Jason Friedberg, Aaron Seltzer","2","3.3","83","2013","comedy","13719","2013-10-31","http://www.imdb.com/title/tt2403029/"
"4","tt0316465","Tue Feb 16 00:00:00 2016","","","Radio","Feature Film","Michael Tollin","6","6.9","109","2003","biography, drama, sport","31692","2003-10-24","http://www.imdb.com/title/tt0316465/"
"5","tt0141369","Tue Feb 16 00:00:00 2016","","","Inspector Gadget","Feature Film","David Kellogg","4","4.1","78","1999","action, adventure, comedy, family, sci_fi","35340","1999-07-18","http://www.imdb.com/title/tt0141369/"
"6","tt0033563","Tue Feb 16 00:00:00 2016","","","Dumbo","Feature Film","Sam Armstrong, Norman Ferguson","6","7.3","64","1941","animation, family, musical","80737","1941-10-23","http://www.imdb.com/title/tt0033563/"
"7","tt0384642","Tue Feb 16 00:00:00 2016","","","Kicking & Screaming","Feature Film","Jesse Dylan","5","5.5","95","2005","comedy, family, romance, sport","29539","2005-05-01","http://www.imdb.com/title/tt0384642/"
"8","tt0116705","Tue Feb 16 00:00:00 2016","","","Jingle All the Way","Feature Film","Brian Levant","7","5.4","89","1996","comedy, family","66879","1996-11-16","http://www.imdb.com/title/tt0116705/"
"9","tt1981677","Tue Feb 16 00:00:00 2016","","","Pitch Perfect","Feature Film","Jason Moore","7","7.2","112","2012","comedy, music, romance","203205","2012-09-28","http://www.imdb.com/title/tt1981677/"
"10","tt0409459","Tue Feb 16 00:00:00 2016","","","Watchmen","Feature Film","Zack Snyder","7","7.6","162","2009","action, mystery, sci_fi","368137","2009-02-23","http://www.imdb.com/title/tt0409459/"
"11","tt1343092","Tue Feb 16 00:00:00 2016","","","The Great Gatsby","Feature Film","Baz Luhrmann","5","7.3","143","2013","drama, romance","345664","2013-05-01","http://www.imdb.com/title/tt1343092/"
"12","tt0332379","Tue Feb 16 00:00:00 2016","","","School of Rock","Feature Film","Richard Linklater","5","7.1","108","2003","comedy, music","202083","2003-09-09","http://www.imdb.com/title/tt0332379/"
"13","tt0120783","Tue Feb 16 00:00:00 2016","","","The Parent Trap","Feature Film","Nancy Meyers","6","6.4","128","1998","adventure, comedy, drama, family, romance","82087","1998-07-20","http://www.imdb.com/title/tt0120783/"
"14","tt0790636","Tue Feb 16 00:00:00 2016","","","Dallas Buyers Club","Feature Film","Jean-Marc Vallée","7","8.0","117","2013","biography, drama","308118","2013-09-07","http://www.imdb.com/title/tt0790636/"

I have a format file (saved as format.fmt), that when opened in Notepad++ looks like this:

11.0
16
1       SQLCHAR             0       1000    "\",\""    1     Position                   SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR             0       1000    "\",\""    2     Const                      SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR             0       1000    "\",\""    3     Created                    SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR             0       1000    "\",\""    4     Modified                   SQL_Latin1_General_CP1_CI_AS
5       SQLCHAR             0       1000    "\",\""    5     Description                SQL_Latin1_General_CP1_CI_AS
6       SQLCHAR             0       1000    "\",\""    6     Title                      SQL_Latin1_General_CP1_CI_AS
7       SQLCHAR             0       1000    "\",\""    7     TitleType                  SQL_Latin1_General_CP1_CI_AS
8       SQLCHAR             0       1000    "\",\""    8     Directors                  SQL_Latin1_General_CP1_CI_AS
9       SQLCHAR             0       1000    "\",\""    9     YouRated                   SQL_Latin1_General_CP1_CI_AS
10      SQLCHAR             0       1000    "\",\""    10    IMDbRating                 SQL_Latin1_General_CP1_CI_AS
11      SQLCHAR             0       1000    "\",\""    11    Runtime                    SQL_Latin1_General_CP1_CI_AS
12      SQLCHAR             0       1000    "\",\""    12    Year                       SQL_Latin1_General_CP1_CI_AS
13      SQLCHAR             0       1000    "\",\""    13    Genres                     SQL_Latin1_General_CP1_CI_AS
14      SQLCHAR             0       1000    "\",\""    14    NumVotes                   SQL_Latin1_General_CP1_CI_AS
15      SQLCHAR             0       1000    "\",\""    15    ReleaseDate                SQL_Latin1_General_CP1_CI_AS
16      SQLCHAR             0       1000    "\"\n"     16    URL                        SQL_Latin1_General_CP1_CI_AS

When I run the following code, everything imports, however the Latin characters are replaced with a series of strange characters. A few things I tried were changing the .CSV file to UCS-2 BE, adding different conditions in the WITH clause of the BULK INSERT, and changing the variable type in the format file to SQLNCHAR instead of SQLCHAR, but nothing worked. Often what happens in these cases is "0 rows are affected", rather than an error. Any help would be so appreciated.

Walker
  • 153
  • 2
  • 9
  • Perhaps start here http://stackoverflow.com/questions/1628957/allow-special-characters-sql-server-2008 – John Cappelletti Jun 16 '16 at 18:20
  • @JohnCappelletti Thanks for the response, but this is much different from the problem I'm facing. I have read numerous forums looking for this scenario but nothing compares to the complexity of trying to solve this problem when using a BULK INSERT with a format file. – Walker Jun 16 '16 at 18:24

2 Answers2

0

SQL_Latin1_General_CP1_CI_AS is code page 1252 While your text file is UTF-8

I see everyone telling me to use UTF-8 or 16 etc, but I just ran into this issue with writing a clr to encrypt data. I wrote it with UTF-8 based on everything I read, I was able to troubleshoot and determine that the code page was 1252 based on some other web searches.

Here is SO post on it check out @dunos answer Can SQL Server SQL_Latin1_General_CP1_CI_AS be safely converted to Latin1_General_CI_AS?

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/196b4586-1338-434d-ba8c-49fa3c9bdeeb/server-collation-sqllatin1generalcp1cias-versus-latin1generalcias?forum=sqlgetstarted

Community
  • 1
  • 1
Matt
  • 13,833
  • 2
  • 16
  • 28
  • So by making a change to the format file as outlined in these forums I'll be able to solve my problem? – Walker Jun 16 '16 at 18:30
  • That's a good question, I don't use BULK INSERT very much. In SSIS I would set the formatting to UTF-8 but that is a different import technique. perhaps a search will turn something up on it. – Matt Jun 16 '16 at 19:01
0

Try to use Bulk Insert using CODEPAGE = 'ACP'

ncfuncion
  • 227
  • 1
  • 3
  • 11