0

Example output viewing the CSV in Notepad and Excel

Notepad++

1234,"50 the Street",45,60.88,87
5678,"50 the Road
50 the Road",65,70.99,98
9101,"50 the Avenue",12,09.00,43

Excel

enter image description here

In Notepad++, the second line is split into two rows, with the text field repeated on both. The text delimiter (") shows at the beginning on the first row and the end on the second. Opening the CSV in Excel the line is not split and the text isn't duplicated.

I have multiple files to import, each about 300k rows and 45 columns. These files are generated on a regular basis for import (I have no control over generating the CSV files). Can anyone explain why this difference exists and if there is a work around? I don't want to have to manually edit every CSV after its been generated. Thank you.

jarlh
  • 42,561
  • 8
  • 45
  • 63
CMR
  • 1
  • 2
  • There is a CrLF at the end of the text field when I look in Notepadd++. Why would this duplicate the text and not cause a new line in Excel? – CMR Dec 21 '20 at 10:55
  • 1
    A valid CSV may contain fields with line breaks if they are wrapped in quotes. This CSV looks valid. Excel most likely imports the text fully which you can see by widening the second row. In SSIS you probably haven't set up the text delimiters for your fields to be `"`. – GSerg Dec 21 '20 at 10:58
  • _"Why would this duplicate the text and not cause a new line in Excel"_ does it, though? If there is a CRLF and duplicated text in Notepad++, then it is there in the CSV content. What Excel does during import is another story. – Fildor Dec 21 '20 at 10:58
  • The second line has only one double quote. – jdweng Dec 21 '20 at 11:00
  • 1
    @jdweng Because the other double quote is [on the third line](https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules). – GSerg Dec 21 '20 at 11:01
  • Is there any way to fix this via SSIS (C# in script component if needed)? I cant think of a way to find these issues? There are around 10 in each file of 300k rows. Or am I better of saving each as an Excel and importing it as an Excel source? – CMR Dec 21 '20 at 11:14
  • You have a Flat File component in SSIS. You set it up there. – GSerg Dec 21 '20 at 11:18
  • @Gserg the Flat File component cant read the file in the same way Excel does. I need the file to look like it does in Excel. With the flat file component it tries to import the data as it is split like in notepadd++. There is a CRLF at the end of the text field and no closing text delimeter. – CMR Dec 21 '20 at 12:00
  • 2
    @CMR https://i.stack.imgur.com/vro9f.png – GSerg Dec 21 '20 at 13:03
  • Thanks @GSerg. The raw file has multiple instances of Quotes within the text delimeters which means I cant just specify it in the import. The only way Ive got it imported so far is by converting it to Excel first – CMR Dec 21 '20 at 16:44
  • I lost track of the issue. Your test data, if you open that in excel you get the multiple lines as shown in the text. Click on that cell and you will see: https://i.stack.imgur.com/BUGun.png. Flat file connector can handle that as @GSerg is pointing out. Is there other data situations not shared here? Can you share a sample of what you mean by "multiple instances of quotes with the text delimiters"? – Tim Mylott Dec 21 '20 at 22:00
  • The text field is not duplicated in Excel (agree it is in the example I posted but not when I use the full csv), I dont understand why. I have sites like "Company Name "Main Plant" Area 2" - there are a few text fields for company name, address etc. – CMR Dec 22 '20 at 17:55
  • Ive opened all csv files in Excel and saved them as xlsx files. These have now all imported without issue. Its not a process I want to keep but it'll have to do for this year! – CMR Dec 22 '20 at 17:56
  • So what you are saying is that you have invalid CSV files where the quoted multiline fields have quotes in them. If the files are indeed invalid, fix the generation process to [escape the quotes](https://stackoverflow.com/q/17808511/11683) in the quoted fields. – GSerg Dec 23 '20 at 06:47

0 Answers0