73

I'm trying to import a correctly quoted CSV file, meaning data is only quoted if it contains a comma, e.g.:

41, Terminator, Black
42, "Monsters, Inc.", Blue

I observe that the first row imports correctly, but the second row errors in a manner that suggests the quoted comma was treated as a field separator.

I have seen suggestions such as this one

SQL Bulk import from CSV

to change the field terminator

FIELDTERMINATOR='","'

However, my CSV file only quotes fields that need it, so I do not believe that suggestion would work.

Can SQL Server's BULK IMPORT statement import a correctly quoted CSV file? How?

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • Can't be done. The SQL Server Import methods (both BCP and BULK INSERT) do not understand quoting. – RBarryYoung Oct 16 '12 at 03:12
  • Here is a similar post with more options http://stackoverflow.com/questions/782353/sql-server-bulk-insert-of-csv-file-with-inconsistent-quotes – Murray Foxcroft Dec 12 '16 at 16:45
  • That isn't correctly quoted. As per RFC4180, quotes should either be around ALL fields or NO fields. Partially-quoted CSVs are invalid, and cannot be loaded with BULK IMPORT. They can be opened in Excel, in PowerShell or even by OPENROWSET, but they can't be bulk imported directly. – Geoff Griswald Jun 01 '21 at 15:51
  • @GeoffGriswald I don't agree with your conclusion about the RFC4180 spec. Sections 2.5 and 2.6 show examples of mixed-use of quotes and non-quotes, and the ABNF grammar seems to state that any given field can be quoted or non-quoted. However, it is solid advice where possible to always quote every field and use the answer from Dominix. – Eric J. Jun 01 '21 at 20:10
  • Eric, like many people you have misinterpreted the examples. 2.5 shows 2 examples, one fully quoted and one non-quoted. 2.6 shows 2 examples, one fully quoted which spans two lines (to demonstrate that one field contains a CRLF) and one non-quoted which spans one line. There is no partially-quoted CSV spec and RFC4180 does not provide for one. This is purely something that has come about through Excel and its historically poor handling of the CSV format (which was done deliberately, to promote ".XLS"). Excel 2019 finally provides options to handle CSV files correctly, but the damage is done. – Geoff Griswald Jun 03 '21 at 09:15
  • (you should really rename your question to: "Bulk Insert Partially-Quoted CSV File in SQL Server") – Geoff Griswald Jun 03 '21 at 09:19
  • @GeoffGriswald Good suggestion about renaming the question (done). No matter the examples, I still read the ABNF grammar as allowing quoted or non-quoted per element. And yes, Microsoft and other vendors didn't particularly follow the RFC. – Eric J. Jun 03 '21 at 16:01

12 Answers12

35

Unfortunately SQL Server interprets the quoted comma as a delimiter. This applies to both BCP and bulk insert .

From http://msdn.microsoft.com/en-us/library/ms191485%28v=sql.100%29.aspx

If a terminator character occurs within the data, it is interpreted as a terminator, not as data, and the data after that character is interpreted as belonging to the next field or record. Therefore, choose your terminators carefully to make sure that they never appear in your data.

iruvar
  • 22,736
  • 7
  • 53
  • 82
  • 5
    Since SQL Server 2017 we have the FORMAT='CSV' option which can be used for BULK INSERT. Provided your CSV file is FULLY quoted, the fields can now contain commas. – Geoff Griswald Jun 03 '21 at 09:25
33

I know this is an old topic but this feature has now been implemented since SQL Server 2017. The parameter you're looking for is FIELDQUOTE= which defaults to '"'. See more on https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017

Dominix
  • 935
  • 8
  • 14
  • 2
    Took quite a while for SQL Server to add proper support :-) – Eric J. Jun 25 '18 at 21:46
  • Oh, then maybe I can use the new bcp.exe with SQL Server 2012, if it supports such a parameter as well (and not just the BULK COPY directive - or whatever that is). EDIT: no, there isn't a parameter. Maybe there is a global (not per column) format file directive then, hopefully. – bitoolean Nov 26 '18 at 14:53
  • Doesn't work with partially-quoted files like in OPs example – Geoff Griswald May 07 '21 at 11:21
  • If my original question didn't mix quoting and non-quoting, I would accept this one instead. It's certainly sound advice where possible to always quote every field and use this approach. – Eric J. Jun 01 '21 at 20:14
  • 1
    I argue that quoting every field "always" is not the right approach. For instance, `""` indicates *"I know the string has 0-characters"*, whereas no quotes indicates *"I do not know"*. The two are different statements, and to blur that line is a risk to data integrity. – r2evans Jan 06 '22 at 14:52
22

There is another solution for this.

Consider the quotes as part of the fields delimiter, by editing the fmt file.

You can check this out for more information:

http://blogs.msdn.com/b/sqlserverfaq/archive/2010/02/04/how-to-remove-unwanted-quotation-marks-while-importing-a-data-file.aspx

An extract of the link above:

The only way to remove the quotation marks would be to modify the column delimiters specified during the import operation. The only drawback here is that if you inspect the data to be inserted, you will very quickly realize that the column delimiters are different for each column (Delimiters highlighted above).

So to specify different column delimiters for each column, you would need to use a format file if you plan to use Bulk Insert or BCP. If you generate a format file for the above table structure, it would be as follows:

9.0
3
1       SQLCHAR       0       5       "\t"     1     FName              SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR       0       5       "\t"     2     LName              SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       50      "\r\n"   3     Company            SQL_Latin1_General_CP1_CI_AS

Modify the format file to represent the correct column delimiters for each column. The new format file to be used will look like this:

9.0
4
1       SQLCHAR       0       0     "\""      0     FIRST_QUOTE      SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR       0       5     "\",\""   1     FNAME               SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       5     "\",\""   2     LNAME            SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR       0       50    "\"\r\n"  3     COMPANY          SQL_Latin1_General_CP1_CI_AS
Rafael Segovia
  • 247
  • 3
  • 11
  • @parakmiakos: The answer was updated with a summary. – Eric J. Dec 15 '14 at 19:38
  • CSV file: ````"field"",""","second_field"````, that means double quoting the whole field and double double quoting (````""````) double quotes inside a field. If you know use ````","```` as a delimiter between field 1 and 2, you would get 3 fields, although you just wanted 2 fields. Am I wrong? – Niklas Peter Sep 23 '16 at 11:30
  • When inserting data that does not have an empty row on the last line, this makes a double-quote show up on the last column of the last row. I have not yet found a way around this issue... – Wouter Dec 01 '16 at 11:12
  • Just REPLACE the double-quote with nothing after your bulk insert has completed – Geoff Griswald May 07 '21 at 11:23
8

Make sure you have enabled TextQualified option and set it to be ".

Hilikus
  • 9,954
  • 14
  • 65
  • 118
Bangxin
  • 89
  • 1
  • 1
  • This is the correct answer. It _is_ possible if you set a Text Qualifier. This is done when choosing the Data Source in the Import and Export Wizard – Hilikus May 04 '18 at 15:40
  • …and this only works with the BULK COPY command and not the bcp.exe? :-( And I have to rewrite the whole command? :-( Or maybe I just need one of those mysterious format files if there's a setting that I don't need to specify for each column (hopefully! I'm trying to use the same command for many tables). – bitoolean Nov 26 '18 at 14:50
  • I mean it's *an* answer. For true bulk operations when loading dozens or hundreds of files, then using the Import and Export Wizard often is not an option. – Geoff Griswald Jun 01 '21 at 15:53
5

I had this same problem, and I didn't want to have to go the SSIS route, so I found a PowerShell script that is easy to run and handles the case of the quotes with the comma in that particular field:

Source Code and DLL for the PowerShell Script: https://github.com/billgraziano/CsvDataReader

Here's a blog that explains the usage: http://www.sqlteam.com/article/fast-csv-import-in-powershell-to-sql-server

sloppyjoe
  • 51
  • 1
  • 3
  • The PowerShell way indeed solves this question because import-csv itself is smart enough to handle quoted field that includes field terminator. It is a decent solution IMHO – jyao Jun 14 '16 at 18:26
  • I don't know if "smart" enough is the right word. It is tolerant enough of badly-formatted .CSV files to be able to work with partially-quoted files. An excellent solution to be sure. – Geoff Griswald Jun 01 '21 at 15:54
4

It seems this now works on partially quoted csv files as long as you include FORMAT='CSV' in the options. Once I added this, it worked as expected with a partially quoted file.

ouflak
  • 2,458
  • 10
  • 44
  • 49
Nick
  • 41
  • 2
3

You could also look at using OpenRowSet with the CSV text file data provider.

This should be possible with any version of SQL Server >= 2005 although you need to enable the feature.

http://social.msdn.microsoft.com/forums/en-US/sqldataaccess/thread/5869d247-f0a0-4224-80b3-ff2e414be402

Pascal Belloncle
  • 11,184
  • 3
  • 56
  • 56
3

I've spent half a day on this problem. It's best to import using SQL Server Import & Export data wizard. There is a setting in that wizard which solves this problem. Detailed screenshots here: https://www.mssqltips.com/sqlservertip/1316/strip-double-quotes-from-an-import-file-in-integration-services-ssis/ Thanks

mye.morr
  • 31
  • 2
2

I had the same problem, with data that only occasionally double-quotes some text. My solution is to let the BULK LOAD import the double-quotes, then run a REPLACE on the imported data.

For example:

bulk insert CodePoint_tbl from "F:\Data\Map\CodePointOpen\Data\CSV\ab.csv" with (FIRSTROW = 1, FIELDTERMINATOR = ',', ROWTERMINATOR='\n');

update CodePoint_tbl set Postcode = replace(Postcode,'"','') where charindex('"',Postcode) > 0

To make it less painful to write the REPLACE script, just copy and paste what you need from the results of something like this:

select C.ColID, C.[name] as Columnname into #Columns
from syscolumns C
join sysobjects T on C.id = T.id
where T.[name] = 'User_tbl'
order by 1;

declare @QUOTE char(1);
set @QUOTE = Char(39);
select 'Update User_tbl set '+ColumnName+'=replace('+ColumnName+','
 + @QUOTE + '"' + @QUOTE + ',' + @QUOTE + @QUOTE + ');
GO'
from #Columns
where ColID > 2
order by ColID;
Keith MacDonald
  • 218
  • 2
  • 7
2

I had the same problem, however, it worked for me with the following settings:

bulk insert schema.table
from '\\your\data\source.csv'
with (
datafiletype = 'char'
,format = 'CSV'
,firstrow = 2
,fieldterminator = '|'
,rowterminator = '\n'
,tablock
)

My CSV-File looks like this:

"col1"|"col2"
"val1"|"val2"
"val3"|"val4"

My problem was, I had rowterminator set to '0x0a' before, it did not work. Once I changed it to '\n', it started working...

Orsinus
  • 379
  • 5
  • 11
0

Per CSV format specification, I don't think it matters if data is correctly quoted or not, as long as it adheres to specification. Excessive quotes should be handled by the parser, if it's properly implemented. FIELDTERMINATOR should be comma and ROWTERMINATOR is line end - this denotes a standard CSV file. Did you try to import your data with these settings?

Victor Zakharov
  • 25,801
  • 18
  • 85
  • 151
  • 1
    There are no *excessive* quotes in my input data, only the exact amount of quotes required to quote fields that happen to contain a comma. When importing data that is properly CSV quoted, with FIELDTERMINATOR being a comma and the correct ROWTERMINATOR, the bulk import chokes in a manner that indicates it does not understand the quoted comma. – Eric J. Oct 16 '12 at 01:27
  • 2
    @EricJ.: I understand your situation. But I did not know Microsoft implemented their bulk import using String.Split(","), which is the way beginner programmers think CSV works. Shame on Microsoft. You have two options: reparse CSV and add extra quotes, then use method you mentioned OR create a program to generate and execute INSERT statements based on a CSV file. If you go with the last one, make sure you don't transaction-ize them. – Victor Zakharov Oct 16 '12 at 13:23
  • The issue is that people don't understand how a .CSV file should be created. There are two options: Fully-quoted (quotes around all fields), or Non-quoted (no quoted fields). This idea of a partially-quoted CSV file has come about because that is how Microsoft Excel often creates them, but that is not a valid format and is not supported by any database engine other than Microsoft Access. – Geoff Griswald Jun 03 '21 at 09:22
0

Been stung by the same :)

I wrapped this logic into a function to clean up data that had been was already imported

DECLARE @str NVARCHAR(MAX);
DECLARE @quote_identifier NVARCHAR(MAX);

SET @quote_identifier = N'"';

SET @str = N'"quoted stuff"';

SELECT IIF(
           LEFT(@str, 1) = @quote_identifier
           AND RIGHT(@str, 1) = @quote_identifier,
           SUBSTRING(@str, DATALENGTH(@quote_identifier), LEN(@str) - DATALENGTH(@quote_identifier)),
           @str);