Most efficient way to insert parent-child relationship in terms of memory usage?

Question

I'm currently working on a ASP.NET Core project which uses Entity Framework Core with database provider SQL Server. I have multiple entities with multiple relationships between them, and I want to learn how to insert these entities into the database in the most efficient way.

I have tried the tips from Fastest Way of Inserting in Entity Framework regarding inserting in batches, call SaveChanges() after 100,1000 entities and disposing the context. However, the memory usage rises towards 1GB or more, and the thread does not support navigation properties, and the data I have is just a subset of the whole dataset. The data is from an external Web API Call, in which I serialize to entities.

Examples of my entities (simplified, not real entities in my project):

Student:

public class Student
{
 public int StudentId {get;set;}
 public string StudentName {get;set;}
 public List<Course> Course{ get; } = new List<Course>();
}

Course:

public class Course
{
 public int CourseId {get;set;}
 public string CourseName {get;set;}
 public List<Grade> Grades{ get; } = new List<Grade>();
}

Grade:

public class Grade
{
 public int GradeId {get;set;}
 public string GradeValue{get;set;}
}

Student has a one-to-many relationship with Course.

Course has a one-to-many relationship with Grade.

An approximation of the amount of data I have:

Students: ~100-200
Courses: ~5 000 to 10 000
Grades: ~30

As you can see, the sheer amount of entities can be quite large, and the relationship between them needs to be strict. Right now I have managed to organize such that I have a dictionary of Students as key and a List<Course> as value, and each Course has a List<Grade>. The List<Grade> is populated during parsing of the response from the external Web API call. The memory usage rises up from about 100MB to 300-400MB when creating the dictionary.

This is my current code(snippet):

Dictionary<Student,List<Courses>> studentMap;
DbContext context = null;
try
{
 context = new DbContext();
 context.ChangeTracker.AutoDetectChangesEnabled = false;
 foreach (var student in studentMap)
 {
   numberOfStudents++;
   context = AddToContext(context,student,numberOfStudents,10,true);
 }
 context.SaveChanges();
}


private DbContext AddToContext(DbContext context, KeyValuePair<Student, List<Course>> entity, int numberOfStudents, int commitCount, bool recreateContext)
{
 Student entityStudent = entity.Key;
 List<Course> list = entity.Value; //This list ranges from 5000-10000 as mentioned before.
 entityStudent.Courses.AddRange(list); 
 context.Set<Student>().Add(entityStudent);

 if(numberOfStudent % commitCount == 0)
 { 
   context.SaveChanges();
   if(recreateContext)
   {
    context.Dispose();
    context = new DbContext();
    context.ChangeTracker.AutoDetectChangesEnabled = false;
   }
 }
 return context;
}

The whole memory usage rises to more than 1GB, the time it takes to insert about 100 students with 30 000 courses and 3 grades for each course, takes about 5 minutes. Is there any better approach to this? Especially since this is a nested entity with navigation properties.

Best Regards,

Entity Framework is not really suited to insert/update large batches. Try to devide the batch in smaller batches and call the context.SaveChanges() for every smaller batch. — martennis, Apr 16 '19 at 13:23
You could pass the whole graph to SQL Server using JSON and parse it on the server side with OPENJSON. https://learn.microsoft.com/en-us/sql/t-sql/functions/openjson-transact-sql?view=sql-server-2017 — David Browne - Microsoft, Apr 16 '19 at 13:33
EF, in general, is not suited to working with large amounts of entities at one time. It keeps an object cache and applies change tracking to every property of every entity. For 30,000+ entities, that's obviously going to take up a fair amount of resources. You should consider using something like Dapper or even ADO.NET directly, at least for this particular task. — Chris Pratt, Apr 16 '19 at 14:32
@ChrisPratt I thought disabling the change tracking to every entity would be sufficient enough for that. I've been looking into Dapper before my implementation and will check that out. I feel like perhaps I do not need the features that EFCore offers such as change tracking, lazily loading etc. For my project I would only insert/update data in a build pipeline nightly, and use PowerBI to visualize that data. — dd321, Apr 16 '19 at 17:43
You might not. EF essentially excels at one thing and one thing only: dynamic query building. Even that is mostly owed to LINQ, and you can get much the same functionality with just LINQ to SQL. However, the other features of EF such as change-tracking, object-fixup, etc. are often helpful timesavers. Still, especially if you're just doing CRUD, it's probably overkill, and going straight to metal with ADO.NET or at least mostly there with something like Dapper will often serve you better. — Chris Pratt, Apr 16 '19 at 17:50

Most efficient way to insert parent-child relationship in terms of memory usage?

0 Answers0