3

I have an excel file which I receive daily. The number of columns in that file is not specific. My requirement is just to load the last column in my table through SSIS. How will I be able to identify last used column dynamically?

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • is your problem solved? I think that importing excel files using SSIS has many issues and this is one of them so i am interested to see the right answer. Why not replying or accepting one of the answers provided. This helps the others seeking for a similar question – Yahfoufi Dec 12 '17 at 13:58

3 Answers3

3

You can use c# script:

Make sure you add Using System.Data.OleDb; to the Namespaces Region and add output Column LastCol and select data type.

public override void CreateNewOutputRows()
    {
        /*
          Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
          For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
        */
        string fileName = @"C:\test.xlsx";
        string SheetName = "Sheet1";
        string cstr = "Provider.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=No;IMEX=1\"";

    OleDbConnection xlConn = new OleDbConnection(cstr);
    xlConn.Open();

    OleDbCommand xlCmd = xlConn.CreateCommand();
    xlCmd.CommandText = "Select * from [" + SheetName + "]";
    xlCmd.CommandType = CommandType.Text;
    OleDbDataReader rdr = xlCmd.ExecuteReader();

    int rowCt = 0; //Counter

    while (rdr.Read())
    {
        //skip headers
        if (rowCt != 0)
        {
            int maxCol = rdr.FieldCount;
            Output0Buffer.AddRow();
            Output0Buffer.LastCol = (int)rdr[maxCol];
        }
        rowCt++; //increment counter
    }
}
KeithL
  • 5,348
  • 3
  • 19
  • 25
  • I know this is late, but someone just upvoted and made me look again. I should have included xlConn.Close(); Without it, the connection will remain open until the package completes and you cannot manipulate the file (i.e. move to a processed folder). – KeithL Feb 12 '18 at 15:23
1

Solution Overview

Use a Script Task to:

  • Get the last column index
  • Use the following function to convert the index to Column Letter (ex: 1 -> A)

    Private Function GetExcelColumnName(columnNumber As Integer) As String
        Dim dividend As Integer = columnNumber
        Dim columnName As String = String.Empty
        Dim modulo As Integer
    
        While dividend > 0
           modulo = (dividend - 1) Mod 26
           columnName = Convert.ToChar(65 + modulo).ToString() & columnName
           dividend = CInt((dividend - modulo) / 26)
       End While
    
       Return columnName
    End Function
    
  • Build the SQL Command that read only the last column

  • Choose this query as Excel Source

Detailed Solution

This answer is assuming that the Sheet Name is Sheet1, and the programming language used is VB.Net

  1. First create an SSIS variable of type string (i.e. @[User::strQuery])
  2. Add another variable that contains the Excel File Path (i.e. @[User::ExcelFilePath])
  3. Add A Script Task, and select @[User::strQuery] as ReadWrite Variable, and @[User::ExcelFilePath] as ReadOnly Variable (in the script task window)
  4. Set the Script Language to VB.Net and in the script editor window write the following script:

Note: you have to imports System.Data.OleDb

    m_strExcelPath = Dts.Variables.Item("ExcelFilePath").Value.ToString

    Dim strSheetname As String = String.Empty
    Dim intLastColumn As Integer = 0

    m_strExcelConnectionString = Me.BuildConnectionString()
    Try


        Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)

            If OleDBCon.State <> ConnectionState.Open Then
                OleDBCon.Open()
            End If

            'Get all WorkSheets
            m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
                                                               New Object() {Nothing, Nothing, Nothing, "TABLE"})

            'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones

            For Each schRow As DataRow In m_dtschemaTable.Rows
                strSheetname = schRow("TABLE_NAME").ToString

                If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then

                    Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "]", OleDBCon)

                        Dim dtTable As New DataTable("Table1")


                        cmd.CommandType = CommandType.Text

                        Using daGetDataFromSheet As New OleDbDataAdapter(cmd)

                            daGetDataFromSheet.Fill(dtTable)

                        End Using
                    'Get the last Column Index
                    intLastColumn =  dtTable.Columns.Count

                    End Using

                    'when the first correct sheet is found there is no need to check others
                    Exit For

                End If
            Next

            OleDBCon.Close()

        End Using

    Catch ex As Exception
        Throw New Exception(ex.Message, ex)
    End Try

    Dim strColumnname as String = GetExcelColumnName(intLastColumn)
    Dts.Variables.Item("strQuery").Value = "SELECT * FROM [" & strSheetname & strColumnname & ":" & strColumnname & "]"

    Dts.TaskResult = ScriptResults.Success
End Sub


Private Function GetExcelColumnName(columnNumber As Integer) As String
    Dim dividend As Integer = columnNumber
    Dim columnName As String = String.Empty
    Dim modulo As Integer

    While dividend > 0
       modulo = (dividend - 1) Mod 26
       columnName = Convert.ToChar(65 + modulo).ToString() & columnName
       dividend = CInt((dividend - modulo) / 26)
   End While

   Return columnName
End Function
  1. Then you have to add an Excel connection manager, and choose the excel file that you want to import (just select a sample to define the metadata for the first time only)
  2. Assign a default value of Select * from [Sheet1$] to the variable @[User::strQuery]
  3. In the Data Flow Task add an Excel Source, choose SQL Command from variable, and select @[User::strQuery]
  4. Set the DataFlow Task Delay Validation property to True
  5. Add other components to DataFlow Task

References

Hadi
  • 36,233
  • 13
  • 65
  • 124
0

No you can't do that. The number of columns and data types must be determined beforehand and can't be changed. Otherwise SSIS will fail. So no way to dynamically get the last column. Workaround is maybe get the last column from inside excel using some macro then use that as the source for SSIS.

thotwielder
  • 1,563
  • 7
  • 44
  • 83
  • This is not true at all, SSIS has many option that give dynamicity to the Data Flow, there are many methods that can achieve this. Just take a look at my answer and KeithL answer. And by the way, i am not the one who downvoted you question. – Hadi Dec 08 '17 at 20:44