0

Initially I have such a list :

List<Car> cars = db.Car.Where(x => x.ProductionYear == 2005).ToList();

Then I'm trying to join this list with two large tables using LINQ like this :

var joinedList = (from car in cars
                  join driver in db.Driver.ToList() 
                    on car.Id equals driver.CarId
                  join building in db.Building.ToList() 
                    on driver.BuildingId equals building.Id
                  select new Building
                  {
                     Name = building.Name;
                     Id = building.Id;
                     City = building.City;
                  }).ToList();

Both Driver and Building tables have about 1 million rows. When I run this join I get out of memory exception. How can I make this join work? Should I make the join operation on database? If yes, how can I carry cars list to the db? Thanks in advance.

Zein Makki
  • 29,485
  • 6
  • 52
  • 63
jason
  • 6,962
  • 36
  • 117
  • 198

3 Answers3

3

Even if you remove the .ToList() calls inside your join, you code will still pull all the data and perform the join in-memory and not in SQL server. This is because you're using a local list cars in your join. The below should solve your problem:

var joinedList = (from car in db.Car.Where(x => x.ProductionYear == 2005)
                  join driver in db.Driver 
                    on car.Id equals driver.CarId
                  join building in db.Building 
                    on driver.BuildingId equals building.Id
                  select new Building
                  {
                     Name = building.Name;
                     Id = building.Id;
                     City = building.City;
                  }).ToList();

You can remove the last .ToList() and do some paging if you expect to get too many records in the results.

Zein Makki
  • 29,485
  • 6
  • 52
  • 63
  • Now it's using less memory but it still consumes more than 1 GB of memory. Isn't there a way to reduce this? And is there a way to free memory after I'm done with the join? Thanks for the answer. – jason Jun 30 '17 at 08:27
  • @jason `joinedList ` should be everything that you need to worry about after the above code executes. Everything that is not used anymore is marked for garbage collection (and will be collected at some point in the future when GC is fired). However, if `joinedList ` has too many items, check the last line in my answer and implement paging. – Zein Makki Jun 30 '17 at 08:31
  • In one case, I have got out-of-memory exception in that join. I did exactly what you have done. – jason Jun 30 '17 at 08:32
  • @jason The above answer solves the problem where `db.Driver`, `db.Building` and `db.Card` have too many records and you're pulling these into memory because you're doing `join` the wrong way.However If the return result is a huge list that doesn't fit into memory, then paging is the solution for that. Remember, `select new Building` is creating objects in memory, if you have too many (More than 2 GB I guess, you'll get out-of-memory exception) Paging solves that by creating portions of the objects on-demand. – Zein Makki Jun 30 '17 at 08:39
  • I don't know why but when I add `AsQueryable()` after tables, it worked very efficiently. – jason Jun 30 '17 at 09:19
  • @jason This doesn't make sense for me, `db.Driver` is already an `IQueryable` derivative and that method just does casting if applicable. There has to be another factor that is affecting your execution testing. – Zein Makki Jun 30 '17 at 09:30
2

even If You have removed .ToList() replace in .AsQueryable()

AsQueryable Faster then ToList And AsEnumerable enter image description here

enter image description here

  • If you create an IQueryable, then the query may be converted to sql and run on the database server

  • If you create an IEnumerable, then all rows will be pulled into
    memory as objects before running the query.

  • In both cases if you don't call a ToList() or ToArray() then query
    will be executed each time it is used, so, say, you have an
    IQueryable and you fill 4 list boxes from it, then the query will be run against the database 4 times.

so following Used Linq query

var joinedList = (from car in db.Car.Where(x => x.ProductionYear == 2005).AsQueryable()
              join driver in db.Driver.AsQueryable() 
                on car.Id equals driver.CarId
              join building in db.Building.AsQueryable() 
                on driver.BuildingId equals building.Id
              select new Building
              {
                 Name = building.Name,
                 Id = building.Id,
                 City = building.City,
              }).ToList();
kari kalan
  • 497
  • 3
  • 20
1

First don't ever try ToList() while using LINQ(you can) but make sure that you use ToList() as less as possible in a very rare scenarios only. Every time you will get OutOfMemoryException when the table contains many rows. So, here is the code for your question:

var joinedList = (from car in db.Car.GetQueryable().Where(x => x.ProductionYear == 2005)
              join driver in db.Driver.GetQueryable() on car.Id equals driver.CarId
              join building in db.Building.GetQueryable() on driver.BuildingId equals building.Id
              select new Building
              {
                 Name = building.Name;
                 Id = building.Id;
                 City = building.City;
              }).ToList();
  • 1
    *in a very rare scenarios* .. This is wrong. You can do multiple enumerations if you follow this advice which impact performance and does round-trips. The good advise is to understand deferred execution, then you can know when to use `.ToList()` and when not. – Zein Makki Jun 30 '17 at 07:43
  • Yes exactly i agree with you , if you are good coder then you will know when to use and where to use as per your need after understanding execution. Sometimes its helpful based on the business logic. – Starlord Live Jun 30 '17 at 07:53