1

I'm trying to setup something for a movie store website (using ASP.NET, EF4, SQL Server 2008), and in my scenario, I want to allow a "Member" store to import their catalog of movies stored in a text file containing ActorName, MovieTitle, and CatalogNumber as follows:

Actor, Movie, CatalogNumber
John Wayne, True Grit, 4577-12 (repeated for each record)

This data will be used to lookup an actor and movie, and create a "MemberMovie" record, and my import speed is terrible if I import more than 100 or so records using these tables:

  • Actor Table: Fields = {ID, Name, etc.}
  • Movie Table: Fields = {ID, Title, ActorID, etc.}
  • MemberMovie Table: Fields = {ID, CatalogNumber, MovieID, etc.}

My methodology to import data into the MemberMovie table from a text file is as follows (after the file has been uploaded successfully):

  1. Create a context.
  2. For each line in the file, lookup the artist in the Actor table.
  3. For each Movie in the Artist table, lookup the matching title.
  4. If a matching Movie is found, add a new MemberMovie record to the context and call ctx.SaveChanges().

The performance of my implementation is terrible. My expectation is that this can be done with thousands of records in a few seconds (after the file has been uploaded), and I've got something that times out the browser.

My question is this: What is the best approach for performing bulk lookups/inserts like this? Should I call SaveChanges only once rather than for each newly created MemberMovie? Would it be better to implement this using something like a stored procedure?

A snippet of my loop is roughly this (edited for brevity):

while ((fline = file.ReadLine()) != null)
{
    string [] token = fline.Split(separator);

    string Actor = token[0];
    string Movie = token[1];
    string CatNumber = token[2];

    Actor found_actor = ctx.Actors.Where(a => a.Name.Equals(actor)).FirstOrDefault();

    if (found_actor == null)
        continue;

    Movie found_movie = found_actor.Movies.Where( s => s.Title.Equals(title, StringComparison.CurrentCultureIgnoreCase)).FirstOrDefault();

    if (found_movie == null)
        continue;

    ctx.MemberMovies.AddObject(new MemberMovie()
    {
        MemberProfileID = profile_id,
        CatalogNumber = CatNumber,
        Movie = found_movie
    });

    try
    {
        ctx.SaveChanges();
    }
    catch
    {
    }
}

Any help is appreciated!

Thanks, Dennis

Dennis Ward
  • 747
  • 3
  • 8
  • 21
  • EF has poor performance for this kind of tasks. Any import, migration etc. is slow because EF doesn't support bulk operations. First add indexes to your tables and try to reduce number of roundtrips to DB. For example now if you have 50 records with John Wayne you will query Actor table and all related Movies 50 times. It will require redefining your algorithm. You can also try to load all required actors together. You will reduce processing time but increase actual processing and memory load. – Ladislav Mrnka Dec 28 '10 at 13:32

1 Answers1

1

First:

Some time ago I wrote an answer about calling SaveChanges after 1, n or all rows:

When should I call SaveChanges() when creating 1000's of Entity Framework objects? (like during an import)

It is actually better to call SaveChanges after more than 1 row, but not after all.

Second:

Make sure you have index on name in Actors table and title in Movies, that should help. Also you shouldn't select whole Actor, if you need only his ID:

Instead of:

Actor found_actor = ctx.Actors.Where(a => a.Name.Equals(actor)).FirstOrDefault();

you can select:

int? found_actor_id = ctx.Actors.Where(a => a.Name.Equals(actor)).Select(a => a.ID).FirstOrDefault();

and then

Something.ActorID = found_actor_id;

This can be faster, because doesn't require whole Actor entity and doesn't require additional lookups, specially when combined with index.

Third:

If you send a very large file, there is still probability of timeout, even with good performance. You should run this import in separate thread and return response immediately. You can give some kind of identifier to every import and allow user to check status by this ID.

Community
  • 1
  • 1
LukLed
  • 31,452
  • 17
  • 82
  • 107
  • Lukasz - Thanks for the detailed answer, I really appreciate the time and also the link to your other question/answer also had really good information. – Dennis Ward Jan 09 '11 at 19:42