4

I am using linq in c# and I have a quick question.

I have shown very minimum amount of code here, in real-time there are also some sorting operations. I would like to know from below which approach should I use??

Approach 1

 public class UserDetails
    {
        private dbContext db = new dbContext();
        public List<User> ActiveUser()
        {
          return db.Users.Where(m => m.Active == true).ToList();                   
        }
        public List<User> InActiveUser()
        {
          return db.Users.Where(m => m.Active == false).ToList();                   
        }
       
    }

Approach 2

 public class UserDetails
    {
       List<Defect> allUser = new db.Users.ToList();
        public List<User> ActiveUser()
        {
          return allUser.Where(m => m.Active == true).ToList();                   
        }
        public List<User> InActiveUser()
        {
          return allUser.Where(m => m.Active == false).ToList();                   
        }
       
    }

There are more than 20 methods which are fetching the data, every method is fetching data from same table with different where condition. My question is Should I create dbContext and then use separate query in every method (Approch 1) OR Should I create one List and fetch all data and filter it in method itself using where condition. (Approch 2)

Sudhir Dehade
  • 109
  • 1
  • 11
  • 6
    1) Always filter at the source of the data. In method 2 what would happen if you had 20 million users? All those records would be fetched to memory and then filtered in memory, that would be horrible for performance. – Igor Jun 26 '19 at 11:09
  • @Igor That means you are saying first approach is better than the second. – Sudhir Dehade Jun 26 '19 at 11:11
  • 1
    2) Look into using dependency injection which can manage the lifetime of your instances including releasing resources (calling Dispose on your disables types). Then the type `UserDetails` would have a constructor that takes a type `DbContext` and uses it. – Igor Jun 26 '19 at 11:11
  • @Igor I thought calling DB everytime creates more overhead to server. – Sudhir Dehade Jun 26 '19 at 11:13
  • What do you think has more overhead? Calling ActiveUsers 10 times that retrieves 100 active users (even if they are the same ones) each time or calling it 1 time and getting inactive users as well if there are 10 million of those. It really just depends. Also approach 2 will always use stale data depending on how often `UserDetails` is created (*if it is a singleton it would be almost worthless as a type*). If it was called conditionally approach 2 would also add overhead for future calls that are never made. – Igor Jun 26 '19 at 11:14
  • You need know difference between IQueryable and IEnumerable, this topic will be very helpful for understanding: https://stackoverflow.com/questions/2876616/returning-ienumerablet-vs-iqueryablet – Basil Kosovan Jun 26 '19 at 11:15
  • 2
    First approach is better because 1) You materialize data only by request 2) It is possible that other thread update data in db. With second approach you will never get this update – Aleks Andreev Jun 26 '19 at 11:16
  • @Igor consider there are more methods, then everytime it will create separate request and make a round trip. In second approach I know it will take time to load but it will be only once after that the performance will be faster.... I am naive, if I am wrong then please explain – Sudhir Dehade Jun 26 '19 at 11:17
  • 1
    Another issue with the second approach is that your records will get stale. What happens if something updates the Active state of a user but you're still messing with the old state that you pulled into a list. Basically you want to pull a few records as possible, and then you can implement some type of caching (if necessary) where you control how long you keep data in memory before going to the DB to get fresh data. – juharr Jun 26 '19 at 11:17
  • It sounds like you already had your mind made up before you even posted the question. You have multiple pros/cons to the approaches users have given you in answers/comments. Do what you want, it is your code. – Igor Jun 26 '19 at 11:21

3 Answers3

6

It depends on the circumstances. In general, I would prefer approach 1 as you usually can't predict how many users are in your database and you will pull all users into the memory which will cause a big memory overhead. Also for big amounts of users you will take advantage of database optimisations like indexes and query execution plans.

If we are talking about a small amount of users approach 2 might be the more performant one as you reduce the amount of roundtrips to the database. But next to the memory overhead it also has other issues like caching the users and missing database updates.

However, I'd almost always prefer approach 1 as it's good practice to do most of the filtering and sorting work on the database as it is optimized for doing this kind of things. With approach 2 you might get into trouble as your user base grows over time and it will be hard to track down caching or memory issues. Also the difference between one and two database roundtrips is mostly neglegible if you don't have a really bad connection or do it many times.

Christoph Sonntag
  • 4,459
  • 1
  • 24
  • 49
  • "It depends on the circumstances" where can we find exact mathematics for this? I mean how much memory required for x amount of data and How much time it will take to complete one request, How they are related??? – Sudhir Dehade Jun 26 '19 at 11:52
  • 3
    In 99.999% of cases you should use Approach 1 @SudhirDehade. If Approach 1 is slow then _consider_ Approach 2. But always **start** with Approach 1. At your stage, I'd suggest always using Approach 1 and stop worrying about it. – mjwills Jun 26 '19 at 11:56
  • 1
    What @mjwills says is the approach you should take. If you want exact numbers you need to write a performance test for your exact case, with automatic data seeds and you will see exactly how performance and memory usage changes for different data sizes. However, the approach to make your code as easy and as stable/error-prone as possible as long as you don't have noticable performance issues is way easier than calculating artificial numbers that don't necessarily represent your real-world-experience as you can't know things like network speed or user numbers beforehand. – Christoph Sonntag Jun 26 '19 at 12:14
3

Never load unnecessary data. In the Approach 2 you load an unnecessary amount of data.

Y.Y.
  • 664
  • 1
  • 6
  • 22
1

I personally go with the first Approach, because in the first approach filtered data is fetched from the database, While in second approach all data is fetched and then filtering is done at framework Level instead of Database level, So in first approach we can get the performance benefit.

Shobhit Walia
  • 496
  • 4
  • 19