0

I have to import a hundreds records to database from Excel.

Each record has to be verified:

  1. Against duplicate
  2. Has to has foreign key in another table

I’m wondering how should I do this with the highest performance. I know that I shouldn’t use db.SaveChanges(); after each record so after verification - I’m adding each record to temporary list (var recordsToAdd), and I’m saving that list after all. Please check my code below, is this good approach to do this?

using (var db = new DbEntities())
{
    var recordsToAdd = new List<User>();

    for (var row = 2; row <= lastRow; row++)
    {
        var newRecord = new User
        {
            Id = Int32.Parse(worksheet.Cells[idColumn + row].Value.ToNullSafeString()),
            FirstName = worksheet.Cells[firstNameColumn + row].Value.ToNullSafeString(),
            LastName = worksheet.Cells[lastNameColumn + row].Value.ToNullSafeString(),
            SerialNumber = worksheet.Cells[serialNumber + row].Value.ToNullSafeString()
        };

        bool exists = db.User.Any(u => u.Id == newRecord.Id) || recordsToAdd.Any(u => u.Id == newRecord.Id);
        if (!exists)
        {
            bool isSerialNumberExist = db.SerialNumbers.Any(u => u.SerialNumber == newRecord.SerialNumber);
            if (isSerialNumberExist)
            {
                recordsToAdd.Add(newRecord);
            }
            else
            {
                resultMessages.Add(string.Format("SerialNumber doesn't exist"));
            }
        }
        else
        {
            resultMessages.Add(string.Format("Record already exist"));
        }
    }

    db.User.AddRange(recordsToAdd);
    db.SaveChanges();
}
DiPix
  • 5,755
  • 15
  • 61
  • 108
  • What is the type of the `Id` and `SerialNumber` fields - `string`? – Ivan Stoev Oct 23 '16 at 11:27
  • For performance create "newRecord" variable only when the variable "exists" is false. For evaluating "exists" you just need the Id. – derloopkat Oct 23 '16 at 11:28
  • @IvanStoev it depends, in this case it's `string` – DiPix Oct 23 '16 at 11:33
  • @derloopkat good point with creating "newRecord" after "exists" is false. So I should just pick Id from this record before all. – DiPix Oct 23 '16 at 11:37
  • The best approach is to eschew Linq and issue a `MERGE` statement directly to the database server. – Dai Oct 23 '16 at 11:57

3 Answers3

1

One way to improve the performance is to minimize the db calls and linear searches by using a fast lookup data structures for performing the verification - HashSet<string> for Id and Dictionary<string, bool> for SerialNumber:

using (var db = new DbEntities())
{
    var recordsToAdd = new List<User>();

    var userIdSet = new HashSet<string>();
    var serialNumberExistsInfo = new Dictionary<string, bool>();

    for (var row = 2; row <= lastRow; row++)
    {
        var newRecord = new User
        {
            Id = Int32.Parse(worksheet.Cells[idColumn + row].Value.ToNullSafeString()),
            FirstName = worksheet.Cells[firstNameColumn + row].Value.ToNullSafeString(),
            LastName = worksheet.Cells[lastNameColumn + row].Value.ToNullSafeString(),
            SerialNumber = worksheet.Cells[serialNumber + row].Value.ToNullSafeString()
        };

        bool exists = !userIdSet.Add(newRecord.Id) || db.User.Any(u => u.Id == newRecord.Id);
        if (!exists)
        {
            bool isSerialNumberExist;
            if (!serialNumberExistsInfo.TryGetValue(newRecord.SerialNumber, out isSerialNumberExist))
                serialNumberExistsInfo.Add(newRecord.SerialNumber, isSerialNumberExist = 
                    db.SerialNumbers.Any(u => u.SerialNumber == newRecord.SerialNumber));
            if (isSerialNumberExist)
            {
                recordsToAdd.Add(newRecord);              
            }
            else
            {
                resultMessages.Add(string.Format("SerialNumber doesn't exist"));
            }
        }
        else
        {
            resultMessages.Add(string.Format("Record already exist"));
        }
    }

    db.User.AddRange(recordsToAdd);
    db.SaveChanges();
} 
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • First time I see this: userIdSet.Add(newRecord.Id) - Whether it will try to add element and returns True if the element would be added? – DiPix Oct 23 '16 at 13:40
  • 1
    It's [documented](https://msdn.microsoft.com/en-us/library/bb353005(v=vs.110).aspx): *returns true if the element is added to the HashSet object; false if the element is already present.*. So here `!Add` means the id already exists in the set. – Ivan Stoev Oct 23 '16 at 13:54
1

First of all let's separate the code into two parts. First part is creating a list of valid User records to be inserted. Second part is inserting those records to the database (last two lines of your code).

Assuming you are using EntityFramework as your ORM, second part may be optimized by bulk inserting the records. It has many existing solutions that can be easily found. (example)

There are some suggestions concerning the first part.

  1. Load user ids in a HashSet or Dictionary. These data structures are optimized for searching. var userDbIds = new HashSet<int>(db.User.Select(x => x.Id));. You will quickly check if id exists without making a request to DB.

  2. Do the same for serialNumber. var serialNumbers = new HashSet<string>(db.SerialNumber.Select(x => x.SerialNumber)); assuming that type of SerialNumber property is string.

  3. Change the type of your recordToAdd variable to be Dictionary<int, User> for the same reason.

In the check would look like this:

    bool exists = userDbIds.Contains(newRecord.Id) || recordsToAdd.ContainsKey(newRecord.Id);
    if (!exists)
    {
        bool isSerialNumberExist = serialNumbers.Contains(newRecord.SerialNumber);
        if (isSerialNumberExist)
        {
            recordsToAdd[newRecord.Id] = newRecord;
        }
        else
        {
            resultMessages.Add(string.Format("SerialNumber doesn't exist"));
        }
    }
    else
    {
        resultMessages.Add(string.Format("Record already exist"));
    }
Community
  • 1
  • 1
Ivan Gritsenko
  • 4,166
  • 2
  • 20
  • 34
1

It would be most efficient to use a Table-Valued Parameter instead of LINQ. That way you can handle this in a set-based approach that is a single connection, single stored procedure execution, and single transaction. The basic setup is shown in the example code I provided in the following answer (here on S.O.):

How can I insert 10 million records in the shortest time possible?

The stored procedure can handle both validations:

  • don't insert duplicate records
  • make sure that SerialNumber exists

The User-Defined Table Type (UDTT) would be something like:

CREATE TYPE dbo.UserList AS TABLE
(
  Id           INT NOT NULL,
  FirstName    NVARCHAR(50) NOT NULL,
  LastName     NVARCHAR(50) NULL,
  SerialNumber VARCHAR(50) NOT NULL
);

-- Uncomment the following if you get a permissions error:
-- GRANT EXECUTE ON TYPE::[dbo].[UserList] TO [ImportUser];
GO

The stored procedure (executed via SqlCommand.ExecuteNonQuery) would look something like:

CREATE PROCEDURE dbo.ImportUsers
(
  @NewUserList  dbo.UserList READONLY
)
AS
SET NOCOUNT ON;

INSERT INTO dbo.User (Id, FirstName, LastName, SerialNumber)
  SELECT  tmp.Id, tmp.FirstName, tmp.LastName, tmp.SerialNumber
  FROM    @NewUserList tmp
  WHERE   NOT EXISTS (SELECT *
                      FROM   dbo.User usr
                      WHERE  usr.Id = tmp.[Id])
  AND     EXISTS (SELECT *
                  FROM   dbo.SerialNumbers sn
                  WHERE  sn.SerialNumber = tmp.[SerialNumber]);

The stored procedure above simply ignores the invalid records. If you need notification of the "errors", you can use the following definition (executed via SqlCommand.ExecuteReader):

CREATE PROCEDURE dbo.ImportUsers
(
  @NewUserList  dbo.UserList READONLY
)
AS
SET NOCOUNT ON;

CREATE TABLE #TempUsers
(
  Id           INT NOT NULL,
  FirstName    NVARCHAR(50) NOT NULL,
  LastName     NVARCHAR(50) NULL,
  SerialNumber VARCHAR(50) NOT NULL,
  UserExists          BIT NOT NULL DEFAULT (0),
  InvalidSerialNumber BIT NOT NULL DEFAULT (0)
);

INSERT INTO #TempUsers (Id, FirstName, LastName, SerialNumber)
  SELECT  tmp.Id, tmp.FirstName, tmp.LastName, tmp.SerialNumber
  FROM    @NewUserList tmp;

-- Mark existing records
UPDATE tmp
SET    tmp.UserExists = 1
FROM   #TempUsers tmp
WHERE  EXISTS (SELECT *
               FROM   dbo.User usr
               WHERE  usr.Id = tmp.[Id]);

-- Mark invalid SerialNumber records
UPDATE tmp
SET    tmp.InvalidSerialNumber = 1
FROM   #TempUsers tmp
WHERE  tmp.UserExists = 0 -- no need to check already invalid records
AND    NOT EXISTS (SELECT *
                   FROM   dbo.SerialNumbers sn
                   WHERE  sn.SerialNumber = tmp.[SerialNumber]);

-- Insert remaining valid records
INSERT INTO dbo.User (Id, FirstName, LastName, SerialNumber)
  SELECT  tmp.Id, tmp.FirstName, tmp.LastName, tmp.SerialNumber
  FROM    #TempUsers tmp
  WHERE   tmp.UserExists = 0
  AND     tmp.InvalidSerialNumber = 0;

-- return temp table to caller as it contains validation info
SELECT  tmp.Id, tmp.FirstName, tmp.LastName, tmp.SerialNumber,
        tmp.UserExists, tmp.InvalidSerialNumber
FROM    #TempUsers tmp
-- optionally only return records that had a validation error
--  WHERE   tmp.UserExists = 1
--  OR     tmp.InvalidSerialNumber = 1;

When this version of the stored procedure completes, cycle through SqlDataReader.Read() to get the validation info.

Community
  • 1
  • 1
Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171