0

I have been given an Excel file from a customer. It has 4 columns: Id, name, place, date).

I have a table in my database which stores these values. I have to check each row from Excel and compare its values to the database table. If a row already exists, then compare the date and update to latest date from Excel. If the row does not exist yet, insert a new row.

I'm fetching each row and comparing its values using for a loop and updating database using insert/update statement by creating data table adapter.

My problem is this operation is taking 4+ hours to update the data. Is there any efficient way to do this? I have searched a lot and found options like SqlBulkCopy but how will I compare each and every row from database?

I'm using ASP.NET with C# and SQL Server.

Here's my code:

for (var row = 2; row <= workSheet.Dimension.End.Row; row++)
{
    // Get data from excel 
    var Id = workSheet.Cells[row, 1].Text;
    var Name = workSheet.Cells[row, 2].Text;
    var Place = workSheet.Cells[row, 3].Text;
    var dateInExcel = workSheet.Cells[row, 4].Text;

    // check in database if ID exists in database then compare date and update database> 
    if (ID.Rows.Count <= 0) //no row exist in database 
    {
        // Insert row in the database using data table adapter's insert statement 
    }
    else if (Id.Rows.Count > 0) //Id exists in database 
    {  
        if (Db.DateInDB < (dateUpdate)) // compare dates 
        {
            // Update database with the new date using data table adapter Update statement.
        }
    }
}
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Aditi
  • 23
  • 6
  • 4
    SqlBulkCopy into a temp table. MERGE into the target table. – mjwills Nov 17 '19 at 11:25
  • How many records are in the destination table? – tymtam Nov 17 '19 at 11:31
  • 1
    A similar technique to the one @mjwills mentioned is to load the Excel data into a `DataTable` and pass as a table-valued parameter for use in a T-SQL `MERGE` statement. See [this example](https://stackoverflow.com/questions/58893636/datatable-bulk-copy-to-sql-server-merge-row-id/58894324#58894324). – Dan Guzman Nov 17 '19 at 11:43
  • @DanGuzman Good call. A little slower, but generally simpler to write. – mjwills Nov 17 '19 at 11:45
  • @tymtam: destination table has no records. I will receive this file every month so record will increase tremendously after first insertion. – Aditi Nov 17 '19 at 12:19
  • A wise man once wrote: "People will probably tell you to use MERGE.... I urge you not to." Check out [this SO answer](https://stackoverflow.com/a/52780490/3094533) – Zohar Peled Nov 17 '19 at 12:34

2 Answers2

1

@mjwills and @Dan Guzman make very valid points in the comments section.

My suggestion would be to create an SSIS package to import the spreadsheet into a temp table then using a merge query/queries make conditional updates to the requires tables(s).

https://learn.microsoft.com/en-us/sql/integration-services/import-export-data/start-the-sql-server-import-and-export-wizard?view=sql-server-ver15

The simplest way to get a good starting point is to use the import wizard in SSMS and save the resultant Package. Create an SSIS Project in Visual Studio (You will need the correct version of BI Installed, for the target SQL Server version)

https://learn.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt?view=sql-server-ver15

https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15

This approach would leverage SQL doing what it does best, dealing with relational data sets, and moves it out of the asp code.

To invoke this the ASP App would need to handle the initial file upload/whatever and then invoke the SSIS Package.

This can be done by setting the SSIS Package as a Job on the SQL Server, with no schedule and then starting the job when you want it to run.

How to execute an SSIS package from .NET?

There are most likely some optimisations that can be made to this approach; but it should work in principal.

Hope this helps :)

-1

10_000 records taking more than 3x3600s suggests >1s per record - I think it should be possible to improve on that.

Doing the work in the database would result in best performance, but there are few things you can do prior.

  1. Check the basics:

    • Indexes
    • Network speed. Is your timing based on trying on your computer and talking to a cloud database? If the code and the db are in the same cloud (Azure/Amazon/etc.) it may be much faster than what you're measuring with code running on your office computer talking to the db far away.
  2. Use batches. You should be able to get a magnitude better performance if you do work in batches rather than one record at a time.

    • Get 10, 100, 500 or 1000 records from the CSV and fetch the corresponding records from the db. Do the checking for presence and date comparison in memory. After that do a single Save to the database.
tymtam
  • 31,798
  • 8
  • 86
  • 126