0

I have following linq query in my C# code , which is causing System.OutOfmemory excpetion

public SortedSet<string> GetstudentCount()
{
    var studentCount= studentRepository
                  .GetBy(i => i.IsStudentEnabled && i.IsEnrolledAllSubjects) 
                  .AsQueryable()
                  .AsNoTracking()
                  .ToList();

    var studentSortedSet= new SortedSet<string>();
    foreach (var student in studentCount)
    {
        var id = string.Format("{0}:{1}", student.studentFirstName, student.totalScore);
        studentSortedSet.Add(id);
    }

     return new SortedSet<string>(studentCount);
}

So i am trying to optimize it and could see these options, but I am quite not sure since in my development database i don't have enough data to test. I am new to Entity Framework and to Linq, its a bit confusing to me to figure out which way is correct.

1) Removed ToList() in Linq query, but foreach() is taking same time as before(still slow)

2) Tried to remove entire foreach() and added Select() in the Linq query itself, like:

public SortedSet<string> GetStudentCount()
{
    var studentCount= studentRepository
                          .GetBy(i => i.IsStudentEnabled && 
                           i.IsEnrolledAllSubjects)
                          .Select(string.Format("{0}:{1}",                          
                           student.studentFirstName, student.totalScore)) 
                          .AsQueryable()
                          .AsNoTracking()
                          .ToList();

      return new SortedSet<string>(studentCount);
}

But even this takes same time(still slow)

I further thought of removing ToList() here, but this method is used at many places( can confirm that no looping is done on studentCount) , which i am quite not sure if this may cause more issues.

Any suggestion/advice on this is more appreciable.

EDIT2:

public IEnumerable<T> GetBy(Expression<Func<T, bool>> predicate)
{
    return dbSet.Where(predicate);
}

EDIT:

It may be a basic question to many, I request them not to down vote this Question, as here I am trying to get some pointers on how to optimize. There are no hard feelings here. I am trying to learn the stuff and I would appreciate if this makes sense.Thank you

johnny 5
  • 19,893
  • 50
  • 121
  • 195
WonderHeart
  • 678
  • 10
  • 28
  • I am not sure why my question is down voted, at least i would expect a valid explanation here. . There are arounf ~320K records it needs to process, I don't have exact processing time. After couple of minutes, it just throws OutOfMemory exception, i think that gives enough idea how much is it slow. AsQueryable() is added , so that it is applicable to all the collection regardless if the collection is returned as IQueryable of T or IEnumerable of T – WonderHeart Jul 13 '18 at 11:47
  • @mjwills- SortedSet is used to get the data in a sorted manner. – WonderHeart Jul 13 '18 at 11:51
  • Downvotes? I don't know, maybe because not all relevant code is visible, like the `GetBy` method. Maybe because questions about performance tend to be unanswerable because it's impossible to know all parameters. – Gert Arnold Jul 13 '18 at 12:03
  • @mjwills- studentCount.Count() returns around ~320K records. As i mentioned earlier, ToList() on Linq query is slow, that why i am planning to optimize. – WonderHeart Jul 13 '18 at 12:15
  • We really need to know at least the result type of the `GetBy` method, because it's not a standard EF method and all "repository pattern" implementations do different things. The main question is whether it returns `IQueryable` or `IEnumerable`? – Ivan Stoev Jul 13 '18 at 13:24
  • @IvanStoev- Added it in the EDIT2 section – WonderHeart Jul 13 '18 at 13:58
  • Why are you returning an IEnumerable, instead of an IQueryable? – johnny 5 Jul 13 '18 at 14:12
  • Nothing in your code stands out as giving rise to OOM exceptions. The predicate in the `GetBy` method is executed as SQL in the database, so if it is restrictive enough you're not pulling much data into memory. How many records do you get with this predicate? To return `IEnumerable` and not `IQueryable` is your own choice. I can't judge if it's sensible or not, although the vain `AsNoTracking` call indicates it isn't. You also lose the ability to apply projections that will translate back to the generated SQL and, hence, the volume of data returned (see johnny 5's answer). – Gert Arnold Jul 13 '18 at 21:28
  • Ok, from the implementation of the `GetBy` implementation I see that the original EF `IQueryable` is recoverable, i.e. you can continue building server query rather than operating on materialized objects. The next thing which concerns me is the `totalScore` property. So, like any EF related issue, let start with the basic necessary information - we need to see the relevant entity model (`Student` class), fluent configuration if any, the target EF (EF6, EF Core, exact version), and depending of that, is lazy loading involved etc. – Ivan Stoev Jul 14 '18 at 08:51

2 Answers2

2

There are a few things you can do to optimize. But lets go over some issues in general first.

Don't use the repository anti-pattern, you're just writing a wrapper for something entityframework does.

Don't pull everything into memory, if you're getting an out of memory exception, (assuming you've done nothing wrong in your code else where), you're pulling too much data into memory, If you need to pull that much data. create a paging api.

Select only the data you need from the DB. As you've already discovered, pulling back the whole data set when you just need the Firstname and Total Score is a waste.

Pick Better data structure. There is no reason to use a sorted set of string. Also Its quite doubt full that you will get the results you want because people with single digit low scores will be sorted higher, because you're using an alpha sort e.g

Andy:93
Andy:8
Andy:79    

Your sort should probably be done on the SQL end When all of thats done you should have something that looks like this (Minus the paging):

 public class StudentScores
 {
     public string Name { get; set;}
     public string TotalScore {get; set; }
 }

 var results = dbContext.Students.AsNoTracking().Where(s => s.IsStudentEnabled 
                                             && s.IsEnrolledAllSubjects)
                .OrderBy(x => x.studentFirstName).ThenBy(x => x.totalScore)
                .Select(x => new StudentScores { 
                                       Name = x.studentFirstName,
                                       TotalScore = x.totalScore
                }).ToList();

There are other micro-optimizations that you can make, e.g Compiled queries, but Id just start with this for now.

(PS. My main guess is the error is because of something you're not showing us, It seems like you're not showing us the full code because Where is the student variable coming from in this line .Select(string.Format("{0}:{1}", student.studentFirstName, student.totalScore))

johnny 5
  • 19,893
  • 50
  • 121
  • 195
  • I didn't see optimize in your sample code. and the class could be instead with Anonymous_type in select: https://en.wikipedia.org/wiki/Anonymous_type – Dongdong Jul 13 '18 at 15:14
  • @Dongdong, The optimization is to not use a Sorted Set and Sort in SQL instead, My claim is the memory problem is else where, they're not showing use the full code look at the select. the OP should start by first fixing their architecture. – johnny 5 Jul 13 '18 at 15:17
  • @Dongdong, I also mention, they should additionally create a paging api. But the full code for that is out of scope for this problem – johnny 5 Jul 13 '18 at 15:20
2
  1. avoid to use linq in big query, but use stored procedure or functions instead.
  2. use pages to split query: YourFunction(...,int pageSize = 50, int page = 0), then, Get(...).Skip(pageSize *page).Take(pageSize). no page will show 100+ results
  3. if it's web app, split data and page, data will be ajax loaded
  4. use CompiledQuery.Compile: https://www.codeproject.com/Articles/38174/How-to-improve-your-LINQ-query-performance-by-X
  5. use Parallel query and yield return
  6. try to use RoslynLinqRewrite or LinqOptimiser mentioned by this post: http://mattwarren.org/2016/09/29/Optimising-LINQ/
  7. More performance details:
Dongdong
  • 2,208
  • 19
  • 28