2

I have this code which I found on StackOverflow :

public static class EntityFrameworkUtil
{
    public static IEnumerable<T> QueryInChunksOf<T>(this IQueryable<T> queryable, int chunkSize)
    {
        return queryable.QueryChunksOfSize(chunkSize).SelectMany(chunk => chunk);
    }

    public static IEnumerable<T[]> QueryChunksOfSize<T>(this IQueryable<T> queryable, int chunkSize)
    {
        int chunkNumber = 0;
        while (true)
        {
            var query = (chunkNumber == 0)
                ? queryable
                : queryable.Skip(chunkNumber * chunkSize);
            var chunk = query.Take(chunkSize).ToArray();
            if (chunk.Length == 0)
                yield break;
            yield return chunk;
            chunkNumber++;
        }
    }
} 

I use this like this :

int counter = 0;
    foreach (var chunk in _sdb.Posts.OrderBy(c => c.Id).QueryChunksOfSize(100))
        {
            foreach(var post in chunk)
            {
                post.Floor= CorrectFloor(post);
                post.RealAge= AgeCalculator(post);
                post.Area= TerasCheck(post);

                _sdb.Entry(post).State = System.Data.Entity.EntityState.Modified;
            counter++;
            }


        _sdb.SaveChanges();
        Console.WriteLine("SaveChanges worked, counter : " + counter);

    }

This code is supposed to update 300000 rows. The code was working fast at first, but now it works very slowly and it's on 30000th row. Can you tell me why this code is running slowly and how can I make it faster? Thanks.

Community
  • 1
  • 1
jason
  • 6,962
  • 36
  • 117
  • 198
  • I guess thats db normalization issue. But usually you can improve performance by calling save changes outside the loop. – Emad Nov 15 '16 at 05:35
  • @Emad Would calling SaveChanges for entire database be a good idea? – jason Nov 15 '16 at 06:04
  • I think the `SaveChanges()` is currently in the correct place as it saves after every 100 items (although I'd bump that up to 1000). I suspect the slowdown is related to change tracking. I'm no expert on EF, but I think using `AsNoTracking()` would help. `var chunk in _sdb.Posts.AsNoTracking().OrderBy(c => c.Id).QueryChunksOfSize(100))` in the first foreach. – Andy Nichols Nov 15 '16 at 09:11
  • For starters, remove `_sdb.Entry(post).State = System.Data.Entity.EntityState.Modified;`. It's not necessary to mark the whole record as modified. – Gert Arnold Nov 15 '16 at 09:35

1 Answers1

1

I recommend the following:

  1. Use IQueryable<T> instead of IEnumerable<T>. Using an IEnumerable<T> will end up in many queries done to the database that will slow the overall performance. Also the OrderBy clause has non sense here because you want to update all the posts.

  2. Call SaveChanges once. Your example will commit changes each 100 elements.

  3. Review your DataContext configuration to avoid LazyLoading (include the properties you want to load) and tracking of changes (you can call AsNoTracking() as proposed also):

    this.Configuration.LazyLoadingEnabled = false; this.Configuration.AutoDetectChangesEnabled = false; this.Configuration.ProxyCreationEnabled = false;

If this process takes long to execute, be sure that is executed in an async function. Summarizing, something like this:

int counter = 0;
foreach (var post in _sdb.Posts.Include("xxx"))
{
    post.Floor = CorrectFloor(post);
    post.RealAge = AgeCalculator(post);
    post.Area = TerasCheck(post);

    _sdb.Entry(post).State = System.Data.Entity.EntityState.Modified;
    counter++;
}

_sdb.SaveChanges();
Console.WriteLine("SaveChanges worked, counter : " + counter);
Community
  • 1
  • 1
  • 1
    I think the OrderBy is necessary. Otherwise you can't guarantee that each query will have the same order (it almost certainly will, but the query plan *could* change). Also, if id is an identity column then we know any new records will appear at the end. This prevents taking 100 records, then having a new one inserted before then. – Andy Nichols Nov 15 '16 at 11:08