0

I am retrieving data about books from lots of different sources such as XML and web services which I then store in the database using EF Code First 6 via a Generic Repository and and obviously DbContext.

The problem is that performance is very bad.

I have the following (fictional but analogous) POCO in my Model

public class Book
{
    public int Id {get; set;}
    public string Title {get; set;}
}

also

public class BookDataSource
{
    public int Id {get; set;}
    public virtual List<Book> Books {get; set;};
}

So I retrieve the book data from some source and construct the above book object.

I then need to check whether the book already exists in the DB and update it if it does or insert it if it does not. I also need to then delete any books that no longer exist on the data source.

//The following method takes the data source (type: IBookDataSource) to update from as the parameter

public string UpdateBooks(BookDatasource dataSource)
{
    string successMessage = "";

    //Disconnected entities
    List<Book> retreivedBooks= dataSource.RetreiveBooks();

    foreach (Book retreivedBook in retreivedBooks)
    {
        //Check if the dataSource already contains a book (based on title)
        Book localBook =
            dataSource.Books.SingleOrDefault(
                b => b.Title== retreivedBook.Title);

        if (localBook ==null)
        {
            //Insert a new one
            _unitOfWork.BookRepository.Insert(retreivedBook);
        }
        else
        {
            //Update existing
            localBook.Title= retreivedPortalMerchant.PortalsMerchantName;
            _unitOfWork.PortalMerchantRepository.Update(localPortalMerchant);
        }
    }

    //Soft delete any existing ones that no longer exist in the received data
    foreach (Book existingBook in dataSource.Books)
    {
        if (  !retreivedBooks.Exists(
                b => m.Title == existingBook.Title))
        {
            existingBook.Deleted = true;
            _unitOfWork.PortalMerchantRepository.Update(existingBook);
        }
    }
}

However the performance is very bad. Sometimes there are 25000 books retrieved from the data source and I am having to do two for loops. ForEach retreived book, check if one exists in the db the insert/update accordingly. And another one to loop all existing books and check whether it no longer exists on the datasource and soft delete.

Is there a better way to attach the entities and monitor their state. In the above example I think I am querying the context each time and not the DB so why such bad performance. Should I revert to T-SQL?

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
Dan Cook
  • 1,935
  • 7
  • 26
  • 50

2 Answers2

1

For the proper algorithm of inserting-updating-deleting disconnected entities, you can check "Setting the State of Entities in a Graph" section of "Chapter 4. Working with Disconnected Entities Including N-Tier Applications" of "Programming Entity Framework: DbContext by Julia Lerman, Rowan Miller" book.

Also in this SO answer some ways to increase performance of EF is explained. The answer is for bulk inserting however it may work for your scenerio also.

Community
  • 1
  • 1
Oncel Umut TURER
  • 569
  • 10
  • 16
1

The fastest way would be using bulk insert extension

Here's maxlego's description:

It uses SqlBulkCopy and custom datareader to get max performance. As a result it is over 20 times faster than using regular insert or AddRange EntityFramework.BulkInsert vs EF AddRange

context.BulkInsert(hugeAmountOfEntities);
Community
  • 1
  • 1
user1089766
  • 597
  • 6
  • 14