0

I have an excel sheet with some rows. Alongside, I have a db table that has the same definition as the excel sheet in terms of column structure. What I want to do is check if a record exists in the excel sheet but not in the db table then insert it into a new table (an existing empty table) called TableReport for reporting purposes.

CSV File: PathToFile.csv

Original table to do comparison on: FruitTable

New table created for reporting: TableReport

The following code snippets are what you can use to test my scenario. The code below works as I am using a List<T> when PathToFile.csv is relatively quite small- As in like 4 rows. Takes under 30 seconds to complete execution of the program. However my real scenario involves PathToFile.csv have about 200 000 rows and thus the list collection is not so efficient in terms of performance. Thus, I have considered using a Dictionary<TKey, TValue> collection but I am stuck as to which parts within the code to tweak because with both collections, I will have to iterate the entire csv to get all the rows and add the to the checkIfFruitsMatch list in this case. Even if I use a dictionary, I would still need to loop and add them before I even perform the comparision and already that is time consuming. Performance is a very critical requirement in this case.

I tried running the program with my current implementation of the list on a csv with 200 000 rows and it took well over 15 minutes, busy looping through the csv and adding the rows to the list and that didn't even finish before I terminated the program.

How can I achieve this to make the program much more faster. It shouldn't take longer than 10 minutes to execute. I have written this in a Windows Forms Application.

SQL Table Definition:

CREATE TABLE [dbo].[FruitTable](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Apples] [nvarchar](20) NOT NULL,
    [Oranges] [nvarchar](20) NOT NULL,
    [Pears] [nvarchar](20) NOT NULL,
    [Bananas] [nvarchar](20) NOT NULL,
    [DateAddedUtc] [nvarchar](50) NULL
) ON [PRIMARY]

GO

Stored Procedure Definition:

CREATE PROC [dbo].[spAddFruitsToDB]
@Apples [nvarchar](20),
@Oranges [nvarchar](20),
@Pears [nvarchar](20),
@Bananas [nvarchar](20),
@DateAddedUtc [nvarchar](50)
AS
BEGIN
    INSERT INTO TableReport --Has the same definition as FruitTable
    VALUES (@Apples, @Oranges, @Pears, @Bananas, @DateAddedUtc)
END

Code:

    public class FruitClass {

            private SqlConnection mySQLConnection;
            private SqlCommand mySQLCommand;
            private SqlDataReader mySQLDataReader;
            private string myConnectionString;

        private void CheckDataValidity()
        {   
            Microsoft.Office.Interop.Excel.Application Excel_app = new Microsoft.Office.Interop.Excel.Application();
            Microsoft.Office.Interop.Excel.Workbooks work_books = Excel_app.Workbooks;

            Microsoft.Office.Interop.Excel.Workbook work_book = work_books.Open("C:\\PathToFile.csv");

            Microsoft.Office.Interop.Excel.Sheets work_sheets = work_book.Worksheets;
            Microsoft.Office.Interop.Excel.Worksheet work_sheet = (Microsoft.Office.Interop.Excel.Worksheet)work_sheets.get_Item(1);

            List<FruitClass> checkIfFruitsMatch = new List<FruitClass>();
            List<FruitClass> dbFruitsToMatch= new List<FruitClass>();
            string fruitTagNumberForApples = "";

            for (int j = 2; j < work_sheet.Rows.Count; j++)
            {
                FruitClass fruitInstance = new FruitClass();
                fruitInstance.Apples = CellToString(work_sheet.Cells[j, 3]).Trim();
                fruitInstance.Oranges = CellToString(work_sheet.Cells[j, 13]).Trim();
                fruitInstance.Pears = CellToString(work_sheet.Cells[j, 14]).Trim();
                fruitInstance.Bananas = CellToString(work_sheet.Cells[j, 15]).Trim();

                fruitTagNumberForApples = fruitInstance.Apples;

                checkIfFruitsMatch.Add(fruitInstance);

                if (fruitTagNumberForApples == null || fruitTagNumberForApples == "" || fruitTagNumberForApples == string.Empty)
                break;
            }

            //Get fruits in excel and do a comparison with fruits in database table Fruit
            dbFruitsToMatch.Add(ReturnFruitRow());

            IEnumerable<FruitClass> listComparer = checkIfFruitsMatch.Except(dbFruitsToMatch);
            foreach (FruitClass i in listComparer)
            {
                using (var db = new DBEntities())
                {
                    int countDBexisting = db.FruitTable.Where(x => x.Apples == i.Apples).Count();
                    if (countDBexisting > 0) 
                    {
                        //Fruit has been previously logged. No need to insert a duplicate
                    }
                    else
                    {
                        LogFruitToDB(i, "spAddFruitsToDB"); //Insert records into a new table called "TableReport"
                    }
                }
            }

            work_book.Close();
            Excel_app.Quit();

        }


        private void LogFruitToDB(FruitClass fruitInstance, string cmdText)
            {
                myConnectionString = ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString;
                using (mySQLConnection = new SqlConnection(myConnectionString))
                {
                    mySQLCommand = new SqlCommand(cmdText, mySQLConnection);
                    mySQLCommand.CommandType = CommandType.StoredProcedure;

                    SqlParameter Apples_Parameter = new SqlParameter
                    {
                        ParameterName = "@Apples",
                        Value = fruitInstance.Apples
                    };
                    mySQLCommand.Parameters.Add(Apples_Parameter);

                    SqlParameter Oranges_Parameter = new SqlParameter
                    {
                        ParameterName = "@Oranges",
                        Value = fruitInstance.Oranges
                    };
                    mySQLCommand.Parameters.Add(Oranges_Parameter);

                    SqlParameter Pears_Parameter = new SqlParameter
                    {
                        ParameterName = "@Pears",
                        Value = fruitInstance.Pears
                    };
                    mySQLCommand.Parameters.Add(Pears_Parameter);

                    SqlParameter Bananas_Parameter = new SqlParameter
                    {
                        ParameterName = "@Bananas",
                        Value = fruitInstance.Bananas
                    };
                    mySQLCommand.Parameters.Add(Bananas_Parameter);

                    SqlParameter DateAddedUtc_Parameter = new SqlParameter
                    {
                        ParameterName = "@DateAddedUtc",
                        Value = DateTime.UtcNow.ToString()
                    };
                    mySQLCommand.Parameters.Add(DateAddedUtc_Parameter);

                    mySQLConnection.Open();
                    mySQLCommand.ExecuteNonQuery();
                }
            }

private FruitClass ReturnFruitRow()
{
    FruitClass fruitInfo = new FruitClass();
    myConnectionString = ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString;
    using (mySQLConnection = new SqlConnection(myConnectionString))
    {
        string procedureName = "select * from dbo.FruitTable";
        mySQLCommand = new SqlCommand(procedureName, mySQLConnection);
        mySQLCommand.CommandType = CommandType.Text;
        mySQLCommand.Connection = mySQLConnection;
        mySQLCommand.Connection.Open();
        mySQLDataReader = mySQLCommand.ExecuteReader();
        if (mySQLDataReader.HasRows)
        {
            while (mySQLDataReader.Read())
            {
                fruitInfo.Apples = mySQLDataReader.GetString(1);
                fruitInfo.Oranges = mySQLDataReader.GetString(2);
                fruitInfo.Pears = mySQLDataReader.GetString(3);
                fruitInfo.Bananas = mySQLDataReader.GetInt32(4).ToString();
            }
        }
        mySQLCommand.Connection.Close();
    }
    return fruitInfo;
}

        private string CellToString(object p)
        {
            try
            {
                return ((Microsoft.Office.Interop.Excel.Range)p).Value.ToString();
            }
            catch
            {
                return "";
            }
        }
    }

    public class FruitClass
    {
        public string Apples;
        public string Oranges;
        public string Pears;
        public string Bananas;
    }

Note: The csv file came in as a normal .xlsx excel file with columns then got saved as a .csv.

Test.csv: Shown below enter image description here

So, say FruitTable had a matching record: enter image description here

Then, Table Report should look like below when the program is finished: enter image description here

But now real scenario has about 200 000 records. Also worth mentioning that this application is run once a month.

Harold_Finch
  • 682
  • 2
  • 12
  • 33

2 Answers2

1

I think the problem is that you are reading the .csv via interop.

You will gain a lot of time if you read the csv as a flat file subsequently manipulating its values.

see Reading CSV file and storing values into an array

Ole EH Dufour
  • 2,968
  • 4
  • 23
  • 48
  • In my scenario the csv data is not separated by semicolons. It is column based – Harold_Finch Mar 10 '18 at 11:41
  • But you could save it as such manually. I you wish do to so by interop see https://stackoverflow.com/a/5034856/4180382 – Ole EH Dufour Mar 10 '18 at 11:43
  • @Harold_Finch column separated is simply tab spaced, right? – InBetween Mar 10 '18 at 11:46
  • @InBetween No, please see the screenshot which I just added to the question. – Harold_Finch Mar 10 '18 at 11:54
  • 2
    @Harold_Finch what I’m seeing is how excel opens and interprets the csv which has very little do to with what csv format you are dealing with. Open the file with a regular text editor and see what the separator is. – InBetween Mar 10 '18 at 12:07
  • Okay, I see the output in the data is comma separated. If I may ask, why so much emphasis in how the file is opened? @InBetween – Harold_Finch Mar 10 '18 at 12:09
  • Because now you don't need interop ;-) – Ole EH Dufour Mar 10 '18 at 12:10
  • 1
    Opening file in excel is to make it more nicely visible and readable by human eye. But as a program what makes more sense how easily and faster it can be read. So if you can read the excel file as a CSV without using any complex logic it makes more sense. – Chetan Mar 10 '18 at 12:10
  • It took me quite a while to understand what you guys were saying but I got a working solution and it's way faster now! Thanks – Harold_Finch Mar 10 '18 at 14:26
  • Good news that is Harold. Happy coding and have a nice day! – Ole EH Dufour Mar 10 '18 at 15:22
0

Your method will be very expensive in performance. I advise you another method:

1) Create a temp_report table that has the structure of the rows you want to put in your report table.

2) At the beginning of your program empty this new table

delete from dbo.temp_report

3) Get an empty datable yourdatatable with

DataTable yourdatatable = new DataTable();
SqlConnection conn = new SqlConnection (connString);
SqlCommand cmd = new SqlCommand ("select * from dbo.temp_report", conn);
conn.Open ();

// create data adapter
SqlDataAdapter da = new SqlDataAdapter (cmd);

// this will be your datatable
da.Fill (yourdatatable);
conn.Close ();

4) Insert your csv lines into the datatable (c#)

for(here your loop on csv file)
{
    row = yourdatatable.NewRow();
    row["id"] = i;
    row["item"] = "item " + i.ToString();
    yourdatatable.Rows.Add(row);
}

5) Send all your lines to the database using the bulk copy method:

using (SqlConnection destinationConnection = new SqlConnection (connString))
{
   destinationConnection.Open ();

   using (SqlBulkCopy bulkCopy = new SqlBulkCopy (destinationConnection))
   {
       bulkCopy.DestinationTableName = "dbo.temp_report";

       try
       {
           // Write from the source to the destination.
           bulkCopy.WriteToServer (yourdatatable);
       }
       catch (Exception ex)
       {
           Console.WriteLine (ex.Message);
       }

   }
}

6) Insert into the TableReport table the rows that are not in the FruitTable table by making a query that makes the difference (and not by C # code)

7) Note that you can also improve your performance by reading your csv file text with a split on the separator (tab, or, depending on your file)

Esperento57
  • 16,521
  • 3
  • 39
  • 45