I'm trying to import a .CSV file into a SQL Server table using a BULK INSERT with a format file. I can get it to import, but any Latin characters are imported as strange characters. I take a lot of pride in completing this personal project on my own, but I've hit a point where I just need help. I can change the characters by doing some messy UPDATE and REPLACE statements after importing the data, but I really want to be able to import the Latin characters as they appear in the .CSV file in one step. Here's the database and table that I created:
CREATE DATABASE Test;
CREATE TABLE dbo.rawData
([Position] nvarchar(500) NULL,
[Const] nvarchar(500) NULL,
[Created] nvarchar(500) NULL,
[Modified] nvarchar(500) NULL,
[Description] nvarchar(500) NULL,
[Title] nvarchar(500) NOT NULL,
[TitleType] nvarchar(500) NULL,
[Directors] nvarchar(500) NULL,
[YouRated] nvarchar(500) NULL,
[IMDbRating] nvarchar(500) NULL,
[Runtime] nvarchar(500) NULL,
[Year] nvarchar(500) NULL,
[Genres] nvarchar(500) NULL,
[NumVotes] nvarchar(500) NULL,
[ReleaseDate] nvarchar(500) NULL,
[URL] nvarchar(500) NULL,
)
GO
And here is some of the data that I'm working with taken from a .CSV file (saved as ratings.csv). I use Notepad++ and it is encoded in UTF-8. Notice how the last row for "Dallas Buyers Club" has a director with a Latin character in his name:
"position","const","created","modified","description","Title","Title type","Directors","You rated","IMDb Rating","Runtime (mins)","Year","Genres","Num. Votes","Release Date (month/day/year)","URL"
"1","tt0437863","Tue Feb 16 00:00:00 2016","","","The Benchwarmers","Feature Film","Dennis Dugan","5","5.6","80","2006","comedy, romance, sport","39413","2006-04-07","http://www.imdb.com/title/tt0437863/"
"2","tt0085334","Tue Feb 16 00:00:00 2016","","","A Christmas Story","Feature Film","Bob Clark","6","8.1","94","1983","comedy, family","103770","1983-11-18","http://www.imdb.com/title/tt0085334/"
"3","tt2403029","Tue Feb 16 00:00:00 2016","","","The Starving Games","Feature Film","Jason Friedberg, Aaron Seltzer","2","3.3","83","2013","comedy","13719","2013-10-31","http://www.imdb.com/title/tt2403029/"
"4","tt0316465","Tue Feb 16 00:00:00 2016","","","Radio","Feature Film","Michael Tollin","6","6.9","109","2003","biography, drama, sport","31692","2003-10-24","http://www.imdb.com/title/tt0316465/"
"5","tt0141369","Tue Feb 16 00:00:00 2016","","","Inspector Gadget","Feature Film","David Kellogg","4","4.1","78","1999","action, adventure, comedy, family, sci_fi","35340","1999-07-18","http://www.imdb.com/title/tt0141369/"
"6","tt0033563","Tue Feb 16 00:00:00 2016","","","Dumbo","Feature Film","Sam Armstrong, Norman Ferguson","6","7.3","64","1941","animation, family, musical","80737","1941-10-23","http://www.imdb.com/title/tt0033563/"
"7","tt0384642","Tue Feb 16 00:00:00 2016","","","Kicking & Screaming","Feature Film","Jesse Dylan","5","5.5","95","2005","comedy, family, romance, sport","29539","2005-05-01","http://www.imdb.com/title/tt0384642/"
"8","tt0116705","Tue Feb 16 00:00:00 2016","","","Jingle All the Way","Feature Film","Brian Levant","7","5.4","89","1996","comedy, family","66879","1996-11-16","http://www.imdb.com/title/tt0116705/"
"9","tt1981677","Tue Feb 16 00:00:00 2016","","","Pitch Perfect","Feature Film","Jason Moore","7","7.2","112","2012","comedy, music, romance","203205","2012-09-28","http://www.imdb.com/title/tt1981677/"
"10","tt0409459","Tue Feb 16 00:00:00 2016","","","Watchmen","Feature Film","Zack Snyder","7","7.6","162","2009","action, mystery, sci_fi","368137","2009-02-23","http://www.imdb.com/title/tt0409459/"
"11","tt1343092","Tue Feb 16 00:00:00 2016","","","The Great Gatsby","Feature Film","Baz Luhrmann","5","7.3","143","2013","drama, romance","345664","2013-05-01","http://www.imdb.com/title/tt1343092/"
"12","tt0332379","Tue Feb 16 00:00:00 2016","","","School of Rock","Feature Film","Richard Linklater","5","7.1","108","2003","comedy, music","202083","2003-09-09","http://www.imdb.com/title/tt0332379/"
"13","tt0120783","Tue Feb 16 00:00:00 2016","","","The Parent Trap","Feature Film","Nancy Meyers","6","6.4","128","1998","adventure, comedy, drama, family, romance","82087","1998-07-20","http://www.imdb.com/title/tt0120783/"
"14","tt0790636","Tue Feb 16 00:00:00 2016","","","Dallas Buyers Club","Feature Film","Jean-Marc Vallée","7","8.0","117","2013","biography, drama","308118","2013-09-07","http://www.imdb.com/title/tt0790636/"
I have a format file (saved as format.fmt), that when opened in Notepad++ looks like this:
11.0
16
1 SQLCHAR 0 1000 "\",\"" 1 Position SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 1000 "\",\"" 2 Const SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 1000 "\",\"" 3 Created SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 1000 "\",\"" 4 Modified SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 1000 "\",\"" 5 Description SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 1000 "\",\"" 6 Title SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 1000 "\",\"" 7 TitleType SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 0 1000 "\",\"" 8 Directors SQL_Latin1_General_CP1_CI_AS
9 SQLCHAR 0 1000 "\",\"" 9 YouRated SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 0 1000 "\",\"" 10 IMDbRating SQL_Latin1_General_CP1_CI_AS
11 SQLCHAR 0 1000 "\",\"" 11 Runtime SQL_Latin1_General_CP1_CI_AS
12 SQLCHAR 0 1000 "\",\"" 12 Year SQL_Latin1_General_CP1_CI_AS
13 SQLCHAR 0 1000 "\",\"" 13 Genres SQL_Latin1_General_CP1_CI_AS
14 SQLCHAR 0 1000 "\",\"" 14 NumVotes SQL_Latin1_General_CP1_CI_AS
15 SQLCHAR 0 1000 "\",\"" 15 ReleaseDate SQL_Latin1_General_CP1_CI_AS
16 SQLCHAR 0 1000 "\"" 16 URL SQL_Latin1_General_CP1_CI_AS
When I run the following code, everything imports, however the Latin characters are replaced with a series of strange characters. Here is the code that I'm running:
BULK INSERT [Test].[dbo].[rawData]
FROM 'C:\IMDbRatings\Files\ratings.csv' WITH (FIRSTROW = 2, FORMATFILE= 'C:\IMDbRatings\format.fmt');
A few things I tried were changing the .CSV file to UCS-2 BE, adding different conditions in the WITH clause of the BULK INSERT, and changing the variable type in the format file to SQLNCHAR instead of SQLCHAR, but nothing worked. Often what happens in these cases is "0 rows are affected", rather than an error. Any help would be so appreciated.