6

I have a Save object which has several collections associated. Total size of the objects is as follows:

enter image description here

The relations between objects can be infered from this mapping, and seem correctly represented in the database. Also querying works just fine.

modelBuilder.Entity<Save>().HasKey(c => c.SaveId).HasAnnotation("DatabaseGenerated",DatabaseGeneratedOption.Identity);
modelBuilder.Entity<Save>().HasMany(c => c.Families).WithOne(x => x.Save).HasForeignKey(x => x.SaveId);
modelBuilder.Entity<Save>().HasMany(c => c.Countries).WithOne(x => x.Save).HasForeignKey(x => x.SaveId);
modelBuilder.Entity<Save>().HasMany(c => c.Provinces).WithOne(x => x.Save).HasForeignKey(x => x.SaveId);
modelBuilder.Entity<Save>().HasMany(c => c.Pops).WithOne(x => x.Save).HasForeignKey(x => x.SaveId);
modelBuilder.Entity<Country>().HasOne(c => c.Save);
modelBuilder.Entity<Country>().HasMany(c => c.Technologies).WithOne(x => x.Country).HasForeignKey(x => new {x.SaveId, x.CountryId});
modelBuilder.Entity<Country>().HasMany(c => c.Players).WithOne(x => x.Country).HasForeignKey(x => new {x.SaveId, x.CountryId});
modelBuilder.Entity<Country>().HasMany(c => c.Families).WithOne(x => x.Country).HasForeignKey(x => new {x.SaveId, x.OwnerId});
modelBuilder.Entity<Country>().HasMany(c => c.Provinces).WithOne(x => x.Owner);
modelBuilder.Entity<Country>().HasKey(c => new { c.SaveId, c.CountryId });
modelBuilder.Entity<Family>().HasKey(c => new { c.SaveId, c.FamilyId });
modelBuilder.Entity<Family>().HasOne(c => c.Save);
modelBuilder.Entity<CountryPlayer>().HasKey(c => new { c.SaveId, c.CountryId, c.PlayerName });
modelBuilder.Entity<CountryPlayer>().HasOne(c => c.Country);
modelBuilder.Entity<CountryPlayer>().Property(c => c.PlayerName).HasMaxLength(100);
modelBuilder.Entity<CountryTechnology>().HasKey(c => new { c.SaveId, c.CountryId, c.Type });
modelBuilder.Entity<CountryTechnology>().HasOne(c => c.Country);
modelBuilder.Entity<Province>().HasKey(c => new { c.SaveId, c.ProvinceId });
modelBuilder.Entity<Province>().HasMany(c => c.Pops).WithOne(x => x.Province);
modelBuilder.Entity<Province>().HasOne(c => c.Save);
modelBuilder.Entity<Population>().HasKey(c => new { c.SaveId, c.PopId });
modelBuilder.Entity<Population>().HasOne(c => c.Province);
modelBuilder.Entity<Population>().HasOne(c => c.Save);

I parse the entire save from a file so I can't add all the collections one by one. After the parsing I have a Savewith all its associated collections, adding up to 80k objects, none of which are present in the database.

Then, when I call dbContext.Add(save)it takes around 44 seconds to process, with RAM usage going up from 100mb to around 700mb.

Then, when I call dbContext.SaveChanges() (I tried also the regular BulkSaveChanges() method from EF Extensions with no significant difference) it takes an additional 60s, with RAM usage going up to 1,3Gb.

What is going on here? Why so long and so much memory usage? The actual uploading to the database only takes about the last 5 seconds.

PS: I also tried disabling change detection with no effect.

PS2: Actual usage and full code as requested in comments:

public class HomeController : Controller
{
    private readonly ImperatorContext _db;

    public HomeController(ImperatorContext db)
    {
        _db = db;
    }

    [HttpPost]
    [RequestSizeLimit(200000000)]
    public async Task<IActionResult> UploadSave(List<IFormFile> files)
    {
        [...]
        await using (var stream = new FileStream(filePath, FileMode.Open))
        {
            var save = ParadoxParser.Parse(stream, new SaveParser());
            if (_db.Saves.Any(s => s.SaveKey == save.SaveKey))
            {
                 response = "The save you uploaded already exists in the database.";
            }
            else
            {
                 _db.Saves.Add(save);
            }
            _db.BulkSaveChanges();
        }
        [...]
    }

}
Pinx0
  • 1,248
  • 17
  • 28
  • Please show the actual code, and show how are you using `BulkSaveChanges()` – jalsh Jan 28 '20 at 17:33
  • https://stackoverflow.com/questions/5943394/why-is-inserting-entities-in-ef-4-1-so-slow-compared-to-objectcontext/5943699#5943699 – Neil Jan 28 '20 at 17:33
  • You shouldn't try to save 80k records at once. Try just saving say 1000 at a time. – Neil Jan 28 '20 at 17:34
  • @jalsh added, though I don't think it adds any further info. – Pinx0 Jan 28 '20 at 17:40
  • @Neil How can I do that if all the dependant objects are inside the `save` object, I can't loop though it as there is only one object at the top. – Pinx0 Jan 28 '20 at 17:40
  • 2
    Welcome to the amazing world of ORM. Data access code is too important and performance critical to be designed by a machine. – theMayer Jan 28 '20 at 17:42
  • 1
    Entity Framework isn't the best option for bulk operations. Maybe, in this case, try creating a stored procedure and maybe passing the data in as json? I don't know MariaDB. You can still call the sproc via EF. – Matt M Jan 28 '20 at 17:48
  • @theMayer That is true! However, sometimes we just don't have the luxury of writing all that code :( He's going to have to make some tweeks to make it acceptable. – Train Jan 28 '20 at 17:49
  • Put your "Saves" collection in a dictionary or hashset. To search with 'Any' is quiet expensive, this is a simple linear search. You have 1 file per item ? Already just opening the file must be much slower, than executing the insert. And if you call BulkSaveChanges for each single item - that is not a "BulkInsert". – Holger Jan 28 '20 at 18:14
  • 1
    @holger there is only one save. The Any search is extremely fast as it is converted by ef to an exists select in sql. The problem is not there – Pinx0 Jan 28 '20 at 18:41
  • @Pinx So you really Parse only one File ? NOt a List of Files ?OK. The documentation says clearly: *When should I use BulkSaveChanges over SaveChanges? Whenever you have more than one entity to save*, You save only one Entity. Related Entities don't count. EF is not made for something like a complete database copy in one operation. You can be happy it works at all. A more specialized implementation, that does not involve building up an object hierarchiy(parsing) just to keep it for some seconds in memory, just to save it to SQL, might be more appropriate. Streaming conversion would be the best – Holger Jan 28 '20 at 22:51
  • @MattM there is 3rd party library for Entity Framework Core to have better performance bulk insert, around 500x faster. Regular EFCore's AddRange takes 4 seconds to save 10k records. 3rdparty library bulk insert takes only 0.003 sec to save 10k records. Can check my answer. – Asherguru Jan 29 '20 at 09:31
  • 1
    I get why you want to save the complete object graph in one go, but as you say, it doesn't meet your performance expectations, so you will have to find another way. How about loading the file into a 'different' object graph and then extracting the different 'tables' individually and then storing them into the database, that will mean you can do partial inserts at 1000 at a time. – Neil Jan 29 '20 at 10:11
  • @Neil Yes that is just what I thought and tried. Using ``BulkInsert` individually in the 5 main collections, the time has gone down to 28 seconds in total. It still feels too much for me considering the amount of data, but it's something. – Pinx0 Jan 29 '20 at 14:14

3 Answers3

2

Download EFCore.BulkExtensions from nugets

Remove "_db.BulkSaveChanges();" and replace "_db.Saves.Add(save);" with this code

_db.Saves.BulkInsert(save);
Asherguru
  • 1,687
  • 1
  • 5
  • 10
1

I would suggest you take a look at N.EntityFrameworkCore.Extension. It is a bulk extension framework for EFCore 6.0.8+

Install-Package N.EntityFrameworkCore.Extensions

https://www.nuget.org/packages/N.EntityFrameworkCore.Extensions

Once you install the nuget package you can simply use BulkInsert() method directly on the DbContext instance. It supports BulkDelete, BulkInsert, BulkMerge and more.

BulkDelete()

var dbcontext = new MyDbContext();  
var orders = dbcontext.Orders.Where(o => o.TotalPrice < 5.35M);  
dbcontext.BulkDelete(orders);

BulkInsert()

var dbcontext = new MyDbContext();  
var orders = new List<Order>();  
for(int i=0; i<10000; i++)  
{  
   orders.Add(new Order { OrderDate = DateTime.UtcNow, TotalPrice = 2.99 });  
}  
dbcontext.BulkInsert(orders);  
Northern25
  • 71
  • 1
  • 2
0

EDIT: 1. Make sure it's not the DB that's the issue.

Execute your own command to see how fast it runs.

  1. Keep the active Context Graph small by using a new context for each Unit of Work, also try to Turn off AutoDetechChangesEnabled

3.batch a number of commands together

Here is a good article on Entity Framework and slow bulk INSERTs

Train
  • 3,420
  • 2
  • 29
  • 59
  • 1
    It is not SQL (its MariaDB by the way) because the slow part begins on the Add method where the database isn't yet contacted – Pinx0 Jan 28 '20 at 17:42
  • Sorry I missed that part and just automatically assumed that it was `sql`. The same principle still applies because it can still help with performance. Can you show us all the relevant code? That would be more helpful. – Train Jan 28 '20 at 17:46
  • i added the full code, hope it helps you understand the issue ;) – Pinx0 Jan 28 '20 at 17:52
  • Thanks, just another couple of questions. What does `BulkSaveChanges` do? I want to try to find a solution with the minimal amount of round trips to the db. Have you tried something like this? https://entityframeworkcore.com/saving-data-bulk-insert? it does just that for you and is easy to use. – Train Jan 28 '20 at 18:05
  • BulkSaveChanges is from the same library you linked me. However I will test bulk insert as well. – Pinx0 Jan 28 '20 at 18:06
  • Sounds good, please let me know how it goes. If it's still slow we should probably find another solution outside of the ORM. Probably like a stored procedure that was mentioned in the comments. – Train Jan 28 '20 at 18:08