0
public class Student
{
    public int StudentId;
    public string StudentName;
    public int CourseId;
    public virtual Course Courses { get; set; }
}

public class Course
{
    public int CourseId;
    public string CourseName;
    public string Description;
    public ICollection<Student> Students {get;set;}
    public ICollection<Lecture> Lectures { get; set; }
}

public class Lecture
{
    public int LectureId;
    public string LectureName;
    public int CourseId;
    public virtual Course Courses { get; set; }
}

What is the keyword virtual used for here?

I was told a virtual is for lazy loading but I don't understand why. Because when we do

_context.Lecture.FirstOrDefault()

the result returns the first Lecture and it does not include the attribute Course.

To get the Lecture with the Course, we have to use:

_context.Lecture.Include("Courses").FirstOrDefault()

without using a virtual keyword, it's already a lazy-loading.

Then why do we need the keyword?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
ltog
  • 1
  • 1
  • Use `virtual` if you enabled Lazy loading. `Include` is explicit instruction to load related entity and has nothing with Lazy loading. Lazy loading is when you access `Courses` and EF loads this entity by separate query. Also there is difference between EF and EF Core, so specify which EF do you use. – Svyatoslav Danyliv Aug 15 '22 at 17:28

1 Answers1

1

By declaring it virtual you allow EF to substitute the value property with a proxy to enable lazy loading. Using Include() is telling the EF query to eager-load the related data.

In EF6 and prior, lazy loading was enabled by default. With EF Core it is disabled by default. (Or not supported in the earliest versions)

Take the following query:

var lecture = _context.Lecture.Single(x => x.LectureId == lectureId);

to load one lecture.

If you omit virtual then accessing lecture.Course would do one of two things. If the DbContext (_context) was not already tracking an instance of the Course that lecture.CourseId was pointing at, lecture.Course would return #null. If the DbContext was already tracking that instance, then lecture.Course would return that instance. So without lazy loading you might, or might not get a reference, don't count on it being there.

With virtual and lazy loading in the same scenario, the proxy checks if the Course has been provided by the DbContext and returns it if so. If it hasn't been loaded then it will automatically go to the DbContext if it is still in scope and attempt to query it. In this way if you access lecture.Course you can count on it being returned if there is a record in the DB.

Think of lazy loading as a safety net. It comes with a potentially significant performance cost if relied on, but one could argue that a performance hit is the lesser of two evils compared to runtime bugs with inconsistent data. This can be very evident with collections of related entities. In your above example the ICollection<Student> and such should be marked as virtual as well to ensure those can lazy load. Without that you would get back whatever students might have been tracked at the time, which can be very inconsistent data state at runtime.

Take for example you have 2 courses, Course #1 and #2. There are 4 students, A, B, C, and D. All 4 are registered to Course #1 and only A & B are registered to Course B. If we ignore lazy-loading by removing the virtual then the behavior will change depending on which course we load first if we happen to eager-load in one case and forget in the second...

using (var context = new MyAppDbContext())
{
    var course1 = context.Courses
        .Include(x => x.Students)
        .Single(x => x.CourseId == 1);
    var course2 = context.Courses
        .Single(x => x.CourseId == 2);

    var studentCount = course2.Students.Count();
}

Disclaimer: With collections in entities you should ensure these are always initialized so they are ready to go. This can be done in the constructor or on an auto-property:

public ICollection<Student> Students { get; set; } = new List<Student>();

In the above example, studentCount would come back as "2" because in loading Course #1, both Student A & B were loaded via the Include(x => x.Students) This is a pretty obvious example loading the two courses right after one another but this situation can easily occur when loading multiple records that share data, such as search results, etc. It is also affected by how long the DbContext has been alive. This example uses a using block for a new DbContext instance scope, one scoped to the web request or such could be tracking related instances from earlier in the call.

Now reverse the scenario:

using (var context = new MyAppDbContext())
{
    var course2 = context.Courses
        .Include(x => x.Students)
        .Single(x => x.CourseId == 2);
    var course1 = context.Courses
        .Single(x => x.CourseId == 1);

    var studentCount = course1.Students.Count();
}

In this case, only Students A & B were eager loaded. While Course 1 actually references 4 students, studentCount here would return "2" for the two students associated with Course 1 that the DbContext was tracking when Course 1 was loaded. You might expect 4, or 0 knowing that you didn't eager-load the students. The resulting related data is unreliable and what you might or might not get back will be situational.

Where lazy loading will get expensive is when loading sets of data. Say we load a list of 100 students and when working with those students we access student.Course. Eager loading will generate 1 SQL statement to load 100 students and their related courses. Lazy loading will end up executing 1 query for the students, then 100 queries to load course for each student. (I.e. SELECT * FROM Courses WHERE StudentId = 1; SELECT * FROM Courses WHERE StudentId = 2; ...) If student had several lazy loaded properties then that's another 100 queries per lazy load.

Steve Py
  • 26,149
  • 3
  • 25
  • 43