I am retrieving data about books from lots of different sources such as XML and web services which I then store in the database using EF Code First 6 via a Generic Repository and and obviously DbContext.
The problem is that performance is very bad.
I have the following (fictional but analogous) POCO in my Model
public class Book
{
public int Id {get; set;}
public string Title {get; set;}
}
also
public class BookDataSource
{
public int Id {get; set;}
public virtual List<Book> Books {get; set;};
}
So I retrieve the book data from some source and construct the above book object.
I then need to check whether the book already exists in the DB and update it if it does or insert it if it does not. I also need to then delete any books that no longer exist on the data source.
//The following method takes the data source (type: IBookDataSource) to update from as the parameter
public string UpdateBooks(BookDatasource dataSource)
{
string successMessage = "";
//Disconnected entities
List<Book> retreivedBooks= dataSource.RetreiveBooks();
foreach (Book retreivedBook in retreivedBooks)
{
//Check if the dataSource already contains a book (based on title)
Book localBook =
dataSource.Books.SingleOrDefault(
b => b.Title== retreivedBook.Title);
if (localBook ==null)
{
//Insert a new one
_unitOfWork.BookRepository.Insert(retreivedBook);
}
else
{
//Update existing
localBook.Title= retreivedPortalMerchant.PortalsMerchantName;
_unitOfWork.PortalMerchantRepository.Update(localPortalMerchant);
}
}
//Soft delete any existing ones that no longer exist in the received data
foreach (Book existingBook in dataSource.Books)
{
if ( !retreivedBooks.Exists(
b => m.Title == existingBook.Title))
{
existingBook.Deleted = true;
_unitOfWork.PortalMerchantRepository.Update(existingBook);
}
}
}
However the performance is very bad. Sometimes there are 25000 books retrieved from the data source and I am having to do two for loops. ForEach retreived book, check if one exists in the db the insert/update accordingly. And another one to loop all existing books and check whether it no longer exists on the datasource and soft delete.
Is there a better way to attach the entities and monitor their state. In the above example I think I am querying the context each time and not the DB so why such bad performance. Should I revert to T-SQL?