0

We have a old legacy system at work that stores binary data in our SQL Server. When we use our new .NET platform that are using EF6 to find data in the table with FindAsync, it spikes in memory and the performance is terrible compared to Find (the sync method)

Does anyone know a workaround or a way to fix this issue? I have created an issue with Microsoft already,mbut we are in a hurry since our customer are experiencing a 20 minute freeze on the server. The reason for the freeze seems to be that the memory doesn't get released fast enough for the garbage collection to clean the memory, and other scenarios where the server is too busy to clean up fast enough.

I have made a quick .NET 4.8 project with EF 6 to reproduce the error.

using System;
using System.Data.Entity;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;

namespace PerfEFIssueFramework
{
    public class Item
    {
        public int Id { get; set; }
        public byte[] Data { get; set; }
    }

    public class ItemContext : DbContext
    {
        public DbSet<Item> Items { get; set; }

        public ItemContext() : base(@"Data Source=localhost;Initial Catalog=ItemDb;Integrated Security=False;User ID=user;Password=pw")
        { }
    }

    internal class Program
    {
        private static async Task Main(string[] args)
        {
            Console.WriteLine("Ready to consume a lot of memory with EF.");

            using (var db = new ItemContext())
            {
                db.Database.CreateIfNotExists();

                //insert dummy record
                if (db.Items.ToArray().Length == 0)
                {
                    db.Items.Add(new Item { Data = new byte[20 * 1024 * 1024] });
                    db.Items.Add(new Item { Data = new byte[40 * 1024 * 1024] });
                    db.Items.Add(new Item { Data = new byte[60 * 1024 * 1024] });
                    db.Items.Add(new Item { Data = new byte[80 * 1024 * 1024] });
                    db.Items.Add(new Item { Data = new byte[100 * 1024 * 1024] });
                    await db.SaveChangesAsync();
                }
            }

            // Find
            for (int i = 1; i < 6; i++)
            {
                // Find sync - No performance issues
                using (var db = new ItemContext())
                {
                    var stopwatch = Stopwatch.StartNew();
                    Console.WriteLine("Find sync method doesn't have performance and memory issue");
                    var item = db.Items.Find(i);
                    Console.WriteLine($"Record with id '{item.Id}' was fetched in {stopwatch.ElapsedMilliseconds}ms. Press any key to read again...");
                }

                // Find async - performance issues
                using (var db = new ItemContext())
                {
                    var stopwatch = Stopwatch.StartNew();
                    Console.WriteLine("Reproduce FindAsync performance and memory issue:");
                    var item = await db.Items.FindAsync(i);
                    Console.WriteLine($"Record with id '{item.Id}' was fetched in {stopwatch.ElapsedMilliseconds}ms. Press any key to read again...");
                }
            }

            using (var db = new ItemContext())
            {
                db.Database.Delete();
            }
        }
    }
}

Performance issue

Find and FindAsync calls to the database

ID 1 = 20mb
ID 2 = 40mb
ID 3 = 60mb
ID 4 = 80mb
ID 5 = 100mb

What we can clearly see is that find without running the async method, it takes between 150 to 350ms, but async are taking between 13000ms to 280000ms

Memory issue

Picture of the memory issue

  • With 2 mb binary data Find uses about 52 mb
  • With 2 mb binary data FindAsync uses about 96 mb

  • With 20 mb binary data Find uses about 63 mb

  • With 20 mb binary data FindAsync uses about 432 mb

(We found the same issue using EF Core)

HitzSPB
  • 158
  • 10
  • The code you posted can't provide usable results. You can't just use `Stopwatch` to get accurate durations, 6 iterations is far too little and you're actually including the "sync" times in the "async" watch - both calls happen in the *same* iteration so 1) the data is already cached at the server and 2) any garbage collections triggered by the "sync" version will actually be counted in the 'async' stopwatch – Panagiotis Kanavos Jun 04 '20 at 13:16
  • Can't contribute much, but I can say that reduced performance with EF async queries is something I stubled upon in the past... I think it was related to the `ToListAsync` then. Have a look at https://stackoverflow.com/q/28543293/5265292 – grek40 Jun 04 '20 at 13:16
  • For starters, use BenchmarkDotNet to get meaningful benchmarks *and* actual memory usage counts. The chainsaw image you posted is typical of leaking temporary objects - the code generates temporary objects that remain in memory until the GC runs to collect them. You'll see such an image if you try to process a large text file using naive string manipulation, as each operation generates a new temporary string. In this case, the test itself generates 300MB in RAM before it even starts. Those arrays have to be garbage-collected at some point. When that happens, the entire application will freeze – Panagiotis Kanavos Jun 04 '20 at 13:18
  • Hello Panagiotis Kanavos I think stop watch shows perfectly that there is a issue here. But you are somewhat right that to get accurate performance data it would be better to use something else. I will attempt to use the framework you provided. However it's obvious that ´Find´ vs ´FindAsync´ have a huge performance difference and memory usage. We have made many more test than just the one above. – HitzSPB Jun 04 '20 at 17:08
  • I found the same issue in EF Core :/ – HitzSPB Jun 05 '20 at 12:30
  • Does this answer your question? [Entity Framework async operation takes ten times as long to complete](https://stackoverflow.com/questions/28543293/entity-framework-async-operation-takes-ten-times-as-long-to-complete) – binco Oct 12 '20 at 15:33

1 Answers1

0

To use EF with tables with very large blobs you should use Table Splitting, and only fetch the entity with the blob as needed.

Or you can avoid ever loading the blob data into memory by removing the byte[] entity property, and replacing it with a streaming access method on your DbContext like this:

public Stream GetItemData(int Id)
{
    var con = (SqlConnection)Database.Connection;

    if (con.State != System.Data.ConnectionState.Open)
        con.Open();

    var cmd = con.CreateCommand();
    cmd.CommandText = "select data from Items where id = @id";
    cmd.Parameters.Add(new SqlParameter("@id", Id));

    var rdr = cmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess);
    var stream = rdr.GetStream(0);
    return stream;            
}
David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • How is this answer relevant, when the example shows a performance hit with just `Id` and `Data` column? There's not much to be split here, still the async methods perform significantly worse than the sync methods. And *please*, who uses model first in modern EF applications? – grek40 Jun 04 '20 at 15:07
  • Because the answer is "don't do that". Even if FindAsync is somehow extra bad, you shouldn't have entities with large blobs loaded in memory. And if you do, it should be as rarely as possible. – David Browne - Microsoft Jun 04 '20 at 15:42
  • David Browne is easy to say the famous sentence "Don't do that" but we are speaking about a old system that is actually made in Delphi and is being converted to .Net. It is impossible to fix everything at once. I considered your answer, but it doesn't not solve the issue in any way since is 1 blob and not multiple. Neither does it explain why `Find` vs `FindSync` performs so terribly. I have some theories that perhaps could explain why async performance so badly, and it is because of the cancellation part of async. but I have not managed to find the place in the code yet. – HitzSPB Jun 04 '20 at 17:01
  • Also is only Entity Framework that has this issue. Our old delphi code, and reading the data with raw sql creates no issues reading this from the SQL server. If it had anything to do with the SQL server, or generally reading blob from SQL server it would impact everything. Even through you are correct that with the more modern view on development... this should never have been in the SQL server in the first place. But is not a excuse for EF performing so terribly when using async. – HitzSPB Jun 04 '20 at 17:05
  • I agree there's an EF issue here, but you have already opened an issue. As a workaround, avoid loading the blobs in memory (see update for an additional idea on that), or use the Find over FindAsync. – David Browne - Microsoft Jun 04 '20 at 17:45