3

I've got a pipe delimited flat file I want to load with Polybase. String columns are quoted with double quotes and if there are double quotes in the value, they are doubled. I believe that's a valid file format:

 SizeDescription|SizeCode
 "3.5 feet"|"3.5 ft"
 "2.5 inches"|"2.5"""

If I don't specify STRING_DELIMITER then the quotes show up in my table I'm inserting to. If I do specify STRING_DELIMITER='"' then it works properly for the first row (3.5 ft) but fails on the second row (2.5").

Am I doing something wrong? Or is this not supported?

GregGalloway
  • 11,355
  • 3
  • 16
  • 47

1 Answers1

1

The behavior is correct. If you don't specify quote (") as the STRING_DELIMITER, the parser is splitting on the pipe as you expect. If you specify the quote(") as the STRING_DELIMITER, you end up with 3 columns for the first row:

{3.5 feet} {|} {3.5 ft}

And 4 columns for the second row:

{2.5 inches} {|} {2.5} {}

Polybase will fail with the number of columns mismatch.

Matt Usher
  • 1,325
  • 6
  • 10
  • Thanks for helping me understand how Polybase is currently operating and for answering my question, Matt. Unfortunately I believe this behavior needs to be improved to be able to parse this valid file format. (Create a CSV in Excel with a quote character in a cell and see how it saves as CSV. It matches my file format above.) Respectfully, I would call this Polybase behavior a bug. I've created a UserVoice for this: http://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/9882219-fix-string-delimiter-implementation-in-polybase – GregGalloway Sep 23 '15 at 04:21