Seeking a less costly solution for matching Ids of two tables

Question

The application I am building allows a user to upload a .csv file containing multiple rows and columns of data. Each row contains a unique varchar Id. This will ultimately fill in fields of an existing SQL table where there is a matching Id.

Step 1: I am using LinqToCsv and a foreach loop to import the .csv fully into a temporary table.

Step 2: Then I have another foreach loop where I am trying to loop the rows from the temporary table into an existing table only where the Ids match.

Controller Action to complete this process:

[HttpPost]
public ActionResult UploadValidationTable(HttpPostedFileBase csvFile)
{
    var inputFileDescription = new CsvFileDescription
    {
        SeparatorChar = ',',
        FirstLineHasColumnNames = true
    };
    var cc = new CsvContext();
    var filePath = uploadFile(csvFile.InputStream);
    var model = cc.Read<Credit>(filePath, inputFileDescription);

    try
    {
        var entity = new TestEntities();
        var tc = new TemporaryCsvUpload();
        foreach (var item in model)
        {

            tc.Id = item.Id;
            tc.CreditInvoiceAmount = item.CreditInvoiceAmount;
            tc.CreditInvoiceDate = item.CreditInvoiceDate;
            tc.CreditInvoiceNumber = item.CreditInvoiceNumber;
            tc.CreditDeniedDate = item.CreditDeniedDate;
            tc.CreditDeniedReasonId = item.CreditDeniedReasonId;
            tc.CreditDeniedNotes = item.CreditDeniedNotes;
            entity.TemporaryCsvUploads.Add(tc);
        }

        var idMatches = entity.PreexistingTable.Where(x => x.Id == tc.Id);

        foreach (var number in idMatches)
        {
            number.CreditInvoiceDate = tc.CreditInvoiceDate;
            number.CreditInvoiceNumber = tc.CreditInvoiceNumber;
            number.CreditInvoiceAmount = tc.CreditInvoiceAmount;
            number.CreditDeniedDate = tc.CreditDeniedDate;
            number.CreditDeniedReasonId = tc.CreditDeniedReasonId;
            number.CreditDeniedNotes = tc.CreditDeniedNotes;
        }
        entity.SaveChanges();
        entity.Database.ExecuteSqlCommand("TRUNCATE TABLE TemporaryCsvUpload");

        TempData["Success"] = "Updated Successfully";

    }
    catch (LINQtoCSVException)
    {
        TempData["Error"] = "Upload Error: Ensure you have the correct header fields and that the file is of .csv format.";
    }

    return View("Upload");
}

The issue in the above code is that tc is inside the first loop, but the matches are defined after the loop with var idMatches = entity.PreexistingTable.Where(x => x.Id == tc.Id);, so I am only getting the last item of the first loop.

If I nest the second loop then it is way to slow (stopped it after 10 minutes) because there are roughly 1000 rows in the .csv and 7000 in the preexisting table.

Finding a better way to do this is plaguing me. Pretend that the temporary table didn't even come from a .csv and just think about the most efficient way to fill in rows in table 2 from table 1 where the id of that row matches. Thanks for your help!

Is there any statistic on how many rows will eventually match? — Leonid Usov, Jul 30 '15 at 03:02
The only way I can do this successfully at the moment is nesting that second loop, but it is iterating so much more than it has to. I just can't think up a better way to make it work — cfly24, Jul 30 '15 at 03:06

score 3 · Accepted Answer · edited May 23 '17 at 12:23

As your code is written now, much of the work is being done by the application that could much more efficiently be done by SQL Server. You are making hundreds of unnecessary roundtrip calls to the database. When you are mass importing data you want a solution like this:

Bulk import the data. See this answer for helpful guidance on bulk import efficiency with EF.
Join and update destination table.

Processing the import should only require a single mass update query:

update PT set
   CreditInvoiceDate = CSV.CreditInvoiceDate
  ,CreditInvoiceNumber = CSV.CreditInvoiceNumber
  ,CreditInvoiceAmount = CSV.CreditInvoiceAmount
  ,CreditDeniedDate = CSV.CreditDeniedDate
  ,CreditDeniedReasonId = CSV.CreditDeniedReasonId
  ,CreditDeniedNotes = CSV.CreditDeniedNotes
from PreexistingTable PT
join TemporaryCsvUploads CSV on PT.Id = CSV.Id

This query would replace your entire nested loop and apply the same update in a single database call. As long as your table is indexed properly this should run very fast.

**Performance Improvement** (200 Rows in .CSV; 7,500 in Preexisting Table): **Nested Loops:** Stopped at 11 minutes... **Mass Update Query:** 13.5 seconds. — cfly24, Jul 30 '15 at 14:21

score 1 · Answer 2 · answered Jul 30 '15 at 12:25

After saving CSV record into second table which have same fileds as your primary table, execute following procedure in sqlserver

create proc [dbo].[excel_updation]
  as
set xact_abort on

begin transaction
-- First update records
update first_table
   set [ExamDate]      = source.[ExamDate],
       [marks]      = source.[marks],
       [result]      = source.[result],
       [dob] = source.[dob],
       [spdate]      = source.[spdate],
       [agentName]      = source.[agentName],
       [companycode]      = source.[companycode],
       [dp]      = source.[dp],
       [state]      = source.[state],
       [district]      = source.[district],
       [phone]      = source.[phone],
       [examcentre]      = source.[examcentre],
       [examtime]      = source.[examtime],
       [dateGiven]      = source.[dateGiven],
       [smName]      = source.[smName],
       [smNo]      = source.[smNo],
       [bmName]      = source.[bmName],
       [bmNo]      = source.[bmNo]
  from tbUser
 inner join second_table source
    on tbUser.[UserId]     = source.[UserId]

-- And then insert

insert into first_table (exprdate, marks, result, dob, spdate, agentName, companycode, dp, state, district, phone, examcentre, examtime, dateGiven, smName, smNo, bmName, bmNo)
select   [ExamDate], [marks], [result], [dob], [spdate], [agentName], [companycode], [dp], [state], [district], [phone], [examcentre], [examtime], [dateGiven], [smName], [smNo], [bmName], [bmNo]
  from second_table source
 where not exists
       (
          select *
            from first_table
           where first_table.[UserId]     = source.[UserId]
       )

commit transaction

delete from second_table

The condition of this code is only that both table must have same id matching data. Which id match in both table, data of that particular row will be updated in first table.

score 0 · Answer 3 · answered Jul 30 '15 at 03:10

0

As long as the probability of the match is high you should simply attempt update with every row from your CSV, with a condition that the id matches,

UPDATE table SET ... WHERE id = @id

answered Jul 30 '15 at 03:10

Leonid Usov

1,508
12
16

Would that work straight from a Csv if I have 1000 rows with 7 columns each. Column 1 is the Id and only 3 of the next 6 will be filled in with data. So if the Id of the row in the csv matches with the id in the database row, the 3 other fields in that row must be filled in. Does that make sense? – cfly24 Jul 30 '15 at 03:17
well EF allows you to create an object of your entity type, set it's properties as you wish (for example, from the csv object), then `Attach` it to your context, mark it as changed and call `SaveChanges` on it. see [this article](http://stackoverflow.com/a/3594608/5171225) option 2 for more details – Leonid Usov Jul 30 '15 at 03:44

Seeking a less costly solution for matching Ids of two tables

3 Answers3