0

I am trying to Import a Csv File into SQL SERVER using SSIS

Here's an example how data looks like

Student_Name,Student_DOB,Student_ID,Student_Notes,Student_Gender,Student_Mother_Name
Joseph Jade,2005-01-01,1,Good listener,Male,Amy
Amy Jade,2006-01-01,1,Good in science,Female,Amy
....

Csv Columns are not containing text qualifiers (quotations)

I Created a simple package using SSIS to import it into SQL but sometime the data in SQL looked like below

Student_Name    Student_DOB Student_ID  Student_Notes   Student_Gender  Student_Mother_Name
Ali Jade    2004-01-01  1   Good listener   Bad in science  Male,Lisa

The Reason was that somtimes [Student_Notes] column contains Comma (,) that is used as column delimiter so the Row are not imported Correctly

Any suggestions

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • Change the column delimiter, it is bad practice to have the column delimiter inside column data – DanielVorph Nov 03 '16 at 22:41
  • how to do it... i am not the ont who created these csv files – Hadi Nov 03 '16 at 22:43
  • Maybe you can create a program in some programming language so you read these file line by line and process data as you want – DanielVorph Nov 03 '16 at 22:47
  • 1
    Can you automatically determine which is the offending comma? If so you can write a script to clean it up. It seems like one way to fix this would be to count the number of commas per row and if there are too many, remove them starting from the right. Can you manually apply this algorithm to your data and see if it is correct. If so I can help you write a script that cleans it up beforehand. – Nick.Mc Nov 03 '16 at 22:49
  • i can add a script component to my package and do some code... can u help me writing it – Hadi Nov 03 '16 at 22:55
  • First you need to confirm that that is the algorithm you want to use. No use writing code if it doesn't work. Open your CSV file in notepad and comfirm that removing commas working back from the right hand side will fix it. i.e. do commas appear in any other fields? – Nick.Mc Nov 04 '16 at 03:20
  • Commas does not appear in any other fields. – Hadi Nov 04 '16 at 06:13
  • I really dont understand the algorithm u r talking about. Can you give me some code plz? – Hadi Nov 04 '16 at 10:53
  • 2
    I will give you some code but a word of advice: never say 'code plz' it's kind of annoying. – Nick.Mc Nov 04 '16 at 12:01

3 Answers3

1

A word of warning: I'm not a regular C# coder.

But anyway this code does the following:

It opens a file called C:\Input.TXT

It searches each line. If the line has more than 5 commas, it takes all the extra commas out of the third last field (notes)

It writes the result to C:\Output.TXT - that's the one you need to actually import

There are many improvements that could be made:

  • Get file paths from connection managers
  • Error handling
  • An experienced C# programmer could probably do this in hlaf the code

Keep in mind your package will need write access to the appropriate folder

public void Main()
{
    // Search the file and remove extra commas from the third last field
    // Extended from code at
    // http://stackoverflow.com/questions/1915632/open-a-file-and-replace-strings-in-c-sharp
    // Nick McDermaid        

    string sInputLine;
    string sOutputLine;
    string sDelimiter = ",";
    String[] sData;
    int iIndex;

    // open the file for read
    using (System.IO.FileStream inputStream = File.OpenRead("C:\\Input.txt"))
    {
        using (StreamReader inputReader = new StreamReader(inputStream))
        {
            // open the output file
            using (StreamWriter outputWriter = File.AppendText("C:\\Output.txt"))
            {
                // Read each line
                while (null != (sInputLine = inputReader.ReadLine()))
                {
                    // Grab each field out
                    sData = sInputLine.Split(sDelimiter[0]);
                    if (sData.Length <= 6)
                    {
                        // 6 or less fields - just echo it out
                        sOutputLine = sInputLine;
                    }
                    else
                    {
                        // line has more than 6 pieces 
                        // We assume all of the extra commas are in the notes field                                

                        // Put the first three fields together
                        sOutputLine =
                            sData[0] + sDelimiter +
                            sData[1] + sDelimiter +
                            sData[2] + sDelimiter;

                        // Put the middle notes fields together, excluding the delimiter
                        for (iIndex=3; iIndex <= sData.Length - 3; iIndex++)
                        {
                            sOutputLine = sOutputLine + sData[iIndex] + " ";
                        }

                        // Tack on the last two fields
                        sOutputLine = sOutputLine +
                            sDelimiter + sData[sData.Length - 2] +
                            sDelimiter + sData[sData.Length - 1];


                    }

                    // We've evaulted the correct line now write it out
                    outputWriter.WriteLine(sOutputLine);
                }
            }
        }
    }


    Dts.TaskResult = (int)Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success;
}
Nick.Mc
  • 18,304
  • 6
  • 61
  • 91
1

In The Flat File Connection Manager. Make the File as only one column (DT_STR 8000)

Just add a script Component in the dataflowtask and Add Output Columns (Same as Example Shown)

in The script component split each row using the following Code:

\\Student_Name,Student_DOB,Student_ID,Student_Notes,Student_Gender,Student_Mother_Name

Dim strCells() as string = Row.Column0.Split(CChar(","))

Row.StudentName = strCells(0)
Row.StudentDOB = strCells(1)
Row.StudentID = strCells(2)
Row.StudentMother = strCells(strCells.Length - 1)
Row.StudentGender = strCells(strCells.Length - 2)

Dim strNotes as String = String.Empty

For int I = 3 To strCells.Length - 3

strNotes &= strCells(I)

Next

Row.StudentNotes = strNotes

it worked fine for me

0

If import CSV file is not a routine

  1. Import CSV file in Excel
  2. Search error rows with Excel rows filter and rewrite them
  3. Save Excel file in TXT Tab delimited
  4. Import TXT file with SSIS Else make a script that search comma in the Student Notes column range
Hadi
  • 36,233
  • 13
  • 65
  • 124
Jean Noel
  • 947
  • 9
  • 7
  • Please just explain the second step. When working with hundreds of rows !! – Hadi Nov 04 '16 at 10:51
  • Order the last column desc to identify wrong rows. Then you can make some data cleaning with Excel only on wrong rows (append cells, etc...). Obviously this workaround works only if there is a single comma in Student Note cell. Another solution is forbidding comma in web application text field – Jean Noel Nov 04 '16 at 11:01