-1

Description of the current situation:

I have an excel file of approximately 315 columns and 4000 rows. The file contains the answers to a 300-question questionnaire. The data format is as follows:

(Headers)   A | B | C | D | E | F       | Q.1 | Q.2 | ... | Q.300 |
(FirstRow)  Info of first participant   | AnswerCode for every Q  | 

The columns A to F contain contain info on every participant, while the columns Q.1 to Q.300 contain the respective answer code to each question. After storing the file as a large DataTable:

I need to load all 4000 rows on an existing database table, but before I do that I must edit the data format. The end result must become:

ParticipantCode | QuestionCode | AnswerCode | DateOfRegistration
00001           | 0001         | 1234567    | yyyy-MM-dd HH:mm:ss
...             | ...          | ...        | ...
00001           | 0300         | 1234567    | yyyy-MM-dd HH:mm:ss
00002           | 0001         | 1234567    | yyyy-MM-dd HH:mm:ss
...             | ...          | ...        | ...
04000           | 0300         | 1234567    | yyyy-MM-dd HH:mm:ss

So every row of the original ExcelDataTable is transformed into 300 rows in the FinalDataTable. In this way, the FinalDataTable will have about 1.2 million rows.

What Have I implemented so far:

Private Function MyFunction()
    For Each ExcelRow As DataRow In ExcelDataTable.Rows
        For Each ExcelColumn As DataColumn In ExcelDataTable.Columns
            QuestionCodeFound = False
            ExcelColumnNameRaw = ExcelColumn.ColumnName.ToString.Trim
            If ExcelColumnNameRaw.StartsWith("Q") Then
                ' Correct the headers
                ExcelColumnSplit = ExcelColumnNameRaw.Split("#")
                ExcelColumnName = String.Concat(ExcelColumnSplit(0), ExcelColumnSplit(1))

                SelectedRowFromDT = QuestionCodeAndQuestionIDDataTable.Select("QuestionID = '" + ExcelColumnName + "'")

                ' Search for "_", because some questions are different
                If SelectedRowFromDT.Length > 0 Then
                    QuestionCodeFound = True
                Else
                    Dim ExcelColumnSplitForMult As String()
                    ExcelColumnSplitForMult = ExcelColumnName.Split("_")
                    SelectedRowFromDT = QuestionCodeAndQuestionIDDataTable.Select("QuestionID = '" + ExcelColumnSplitForMult(0).ToString + "'")
                    If SelectedRowFromDT.Length > 0 Then
                        QuestionCodeFound = True
                    End If
                End If
                If QuestionCodeFound Then
                    Dim QuestionCode As String
                    Dim QuestionTypeDataTable As DataTable
                    Dim QuestionType As String

                    ' Get the Question Type from the respective table
                    QuestionType = String.Empty
                    QuestionCode = SelectedRowFromDT(0).Item("QuestionCode").ToString
                    QuestionTypeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)

                    If QuestionTypeDataTable.Rows.Count > 0 Then
                        QuestionType = QuestionTypeDataTable.Rows(0).Item(0).ToString.Trim
                    End If

                    ' Fix the Date Format 
                    DateRaw = ExcelRow.Item(1).ToString
                    DateSplit = DateRaw.Split("/")
                    If DateSplit(0).Length = 1 Then
                        DateSplit(0) = String.Concat("0", DateSplit(0))
                    End If
                    If DateSplit(1).Length = 1 Then
                        DateSplit(1) = String.Concat("0", DateSplit(1))
                    End If
                    DateText = String.Concat(DateSplit(0), "/", DateSplit(1), "/", DateSplit(2))
                    DateRegistration = DateTime.ParseExact(DateText, "MM/dd/yyyy", CultureInfo.InvariantCulture)
                    DateRegistrationReformed = DateRegistration.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture)
                    DateRegFinal = DateTime.ParseExact((DateRegistrationReformed + " " + "10:00:00").ToString, "yyyy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture)

                    Dim AnswerValue As String
                    Dim AnswerCode As String
                    Dim AnswerCodeDataTable As DataTable
                    Dim QuestionWasAnswer As String

                    Dim AnswerValueRow() As DataRow = ExcelDataTable.Select("ParticipantCode = '" + ExcelRow.Item(2).ToString + "'")

                    AnswerCodeDataTable = New DataTable
                    AnswerValue = ""
                    QuestionWasAnswer = "0"

                    ' Complete "QuestionWasAnswer" field for all questions and retrieve the AnswerCode for the answer given by each participant
                    If AnswerValueRow.Length > 0 And AnswerValueRow(0).Item(ExcelColumnNameRaw).GetType IsNot GetType(DBNull) Then
                        If Not (QuestionType.Equals("02") Or QuestionType.Equals("03")) Then
                            AnswerValue = AnswerValueRow(0).Item(ExcelColumnNameRaw)
                            QuestionWasAnswer = "1"
                        ElseIf QuestionType.Equals("02") Or QuestionType.Equals("03") Then
                            Dim ExcelColumnSplitForMultSecond As String()
                            Dim MultAnswerValue As String

                            ExcelColumnSplitForMultSecond = ExcelColumnName.Split("_")
                            MultAnswerValue = AnswerValueRow(0).Item(ExcelColumnNameRaw).ToString.Trim
                            AnswerValue = ExcelColumnSplitForMultSecond(1).ToString

                            If MultAnswerValue.Equals("1") Then
                                QuestionWasAnswer = "1"
                            ElseIf MultAnswerValue.Equals("2") Then
                                QuestionWasAnswer = "2"
                            End If
                        End If

                        ' Search in the Answers table for the existing AnswerCode
                        SQLString = String.Format("SELECT Answers.AnswerCode
                                                       FROM Answers
                                                       WHERE Answers.QuestionCode = '{0}'
                                                           AND (Answers.AnswerNumber = '{1}' OR Answers.Answer = '{1}')", QuestionCode, AnswerValue)
                        AnswerCodeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)

                        If AnswerCodeDataTable.Rows.Count > 0 Then
                            AnswerCode = AnswerCodeDataTable.Rows(0).Item(0).ToString
                            FormattedDataTable.Rows.Add(ParticipantAnswerCode, ExcelRow.Item(2), QuestionCode, AnswerCode, QuestionWasAnswer, DateRegFinal)
                            ParticipantAnswerCode = Convert.ToInt32(ParticipantAnswerCode + 1).ToString.PadLeft(ParticipantAnswerCodeFieldLength, "0")
                        Else
                            ' If a given answer does not exist, save it in the respective table and then try again
                            Dim AnswerCodeLength = GetLengthFromSqlDataBase(My.Settings.ConnectionString, "Answers", "AnswerCode")
                            Dim NextAnswerCode = CalculateNextAnswerCode(AnswerCodeLength)
                            Dim NestAnswerNumber = CalculateNextAnswerNumber(QuestionCode)

                            SaveNewAnswer(NextAnswerCode, QuestionCode, NestAnswerNumber, AnswerValue)

                            SQLString = String.Format("SELECT Answers.AnswerCode
                                                       FROM Answers
                                                       WHERE Answers.QuestionCode = '{0}'
                                                           AND Answers.Answer = '{1}'", QuestionCode, AnswerValue)
                            AnswerCodeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)

                            If AnswerCodeDataTable.Rows.Count > 0 Then
                                AnswerCode = AnswerCodeDataTable.Rows(0).Item(0).ToString
                                FormattedDataTable.Rows.Add(ParticipantAnswerCode, ExcelRow.Item(2), QuestionCode, AnswerCode, QuestionWasAnswer, DateRegFinal)
                                ParticipantAnswerCode = Convert.ToInt32(ParticipantAnswerCode + 1).ToString.PadLeft(ParticipantAnswerCodeFieldLength, "0")
                            End If
                        End If
                    End If
                End If
            End If
        Next
    Next
    Return FormattedDataTable
End Function

After that, I bulk insert the FinalDataTable on the DB.

The problem I am facing:

Using the current program I built, every row in the ExcelDataTable takes about 40 seconds to transform into 300 rows in the FinalDataTable. If I try to load all 4000 rows, it will take more than 40 hours to transform the entire datatable. I need to find a faster way to do this.

M_J
  • 23
  • 4
  • 1
    You've not really given us a lot to go on here. 4000 rows is not a huge number, so the reason it's taking so long must either how your parsing the data or what your doing with the parsed results. Neither of which you've given us any information. Going by the little code you have provided each pass consists of 4 steps. Can you identify through debugging which is slowest, then update your question with that process – Hursey Sep 26 '21 at 19:50
  • You're trying to transform 4000 rows into 1.2 million rows. Since you didn't provide the complete code and some sample data (10 rows to transform into 3000 rows), we can't really help you to optimize the code. I just want to offer an alternative solution: using unpivot in SQL. You can read the excel file using `OPENROWSET` or bulk insert the excel into a temporary table first. Then insert the unpivoted data into your table. – Han Sep 27 '21 at 07:57
  • Thank you both for your replies! I will edit the post as soon as I find some time! – M_J Sep 27 '21 at 09:56
  • Can you provide some input (data from excel) and output (your expected transformed data) sample? – Han Sep 27 '21 at 19:39
  • It seems you query a table `SELECT Answers.AnswerCode...` in each of your iteration. Is it possible that the same question and answer are queried multiple times? If yes, you can use some kind of caching. Check the cache, if the question is not found, retrieve from the database and keep it in cache. The next time you need it again, you can find it in the cache. You can use Dictionary object to store the questions. – Han Sep 27 '21 at 19:44
  • 1
    @Han The dictionary solution you proposed reduced the time needed, thank you! – M_J Sep 29 '21 at 07:19

1 Answers1

0

As mentioned, there isn't much to go off of on this with what has been provided.

I'm sure there are more helpful fixes to consider but I wanted to put my two cents in about the For Loops.

I recommend switching the

For Each

statements with

For i as integer = 0 to ExcelDataTable.Rows.Count - 1

I've read that For Each is not as performance-friendly as it gathers each "row" as a collection, therefore increasing the overhead per loop.

Here is a SO post about this subject:

Major difference between 'for each' and 'for' loop in .NET

Not sure if that will make a difference for you but thought I would recommend it anyway.

J. Rockwood
  • 118
  • 7