3

I have the following (generalized) query:

var listOfPossibleCars= new List<string>();
var listOfCars = db.Cars.Where(s => listOfPossibleCars.Contains(s.CarName)).ToList();

But listOfPossibleCars is large any contains hundreds of thousands of records. This query is causing a delay and I'm wondering how it can be improved.

Gilad Green
  • 36,708
  • 7
  • 61
  • 95
SB2055
  • 12,272
  • 32
  • 97
  • 202
  • Is listOfPossibleCars always the same, or it's changing often (every call)? – Evk Nov 05 '16 at 19:28
  • @Evk it's different every time, and big every time – SB2055 Nov 05 '16 at 19:47
  • How big is the whole Cars table? Might it be feasible to bring it into memory? Otherwise i would recommend having a temp table, which you SqlBulkIncert into, and then do a join like @Evik answered, although, I would recommend having 2 tables, 1 with a query ID, and then you insert into the Filter table as (int FilterID, varchar Value). So you can identify each filter uniquely rather than needing to lock it so that no one tries to update it concurrently – Michal Ciechan Nov 05 '16 at 21:26
  • @MichalCiechan millions of rows, so unfortunately in-memory isn't an option there. – SB2055 Nov 05 '16 at 21:54
  • 1
    Possible duplicate: http://stackoverflow.com/q/24534217/861716 – Gert Arnold Nov 05 '16 at 22:08
  • @GertArnold that's what I was looking for. – SB2055 Nov 06 '16 at 02:29

2 Answers2

2

Instead of doing this with an in-memory collection insert the data of listOfPossibleCars into the database to a temporary table with properly defined indexes and partitions and have this happen in the database. With properly defined tables the database will perform a hash-join.

After data is in the database the query will look like:

//Instead data to database
from car in db.Cars
join possibleCar in db.PossibleCars on car.CarName equals possibleCar.CarName
select car;
Gilad Green
  • 36,708
  • 7
  • 61
  • 95
  • The problem is that I need change tracking on these entities with EF... any idea how I can achieve this with an EF query? – SB2055 Nov 05 '16 at 19:09
  • Inserting the list into the database is just so the DB can do the filtering more efficient. Whatever you want to do with the `cars` later you can still do – Gilad Green Nov 05 '16 at 19:11
  • 1
    Note that OP makes entity framework query, not LINQ to objects, so changing List to HashSet will have no effect. – Evk Nov 05 '16 at 19:35
  • You might think I'm downvoter because of that HashSet but that's not the case, and real downvoter most likely won't see your comment at all... – Evk Nov 05 '16 at 20:06
  • @Evk - Didn't think it was you :) no worries – Gilad Green Nov 05 '16 at 20:07
  • 1
    I would highly recommend using SqlBulkInsert rather than standard EF Insert (Add(...)). I would also recommend adding a StringFilter (int Id identity, varchar name), and a StringFilterValue (int FilterId, varchar Value). Insert a row into StringFilter, to get the AutoID, then BCP(SqlBulkInsert) into the FilterValue table with the filter ID and all the cars. This takes a matter of seconds(if not sub) for 100k's rows – Michal Ciechan Nov 05 '16 at 22:39
2

If listOfPossibleCars is relatively static (does not change at all or at least not often), you can indeed put it into database as described in another answer. When it's not the case and assuming you use sql server, you can use table valued parameters. Note that Entity Framework is VERY slow with Contains, especially on long lists.

First create custom table type in database:

CREATE TYPE MyType AS TABLE 
(
    CarName varchar(200) primary key    
)

Then query like this (note that this is code right from my head and not tested, so ask if something doesn't go well):

var listOfPossibleCars = new List<string>();
var dt = new DataTable();
dt.Columns.Add("CarName");
foreach (var car in listOfPossibleCars) {
     dt.Rows.Add(car);
}
var possibleCars = new SqlParameter("possibleCars", SqlDbType.Structured);
possibleCars.Value = dt;
possibleCars.TypeName = "dbo.MyType";

var listOfCars = db.Cars.SqlQuery("select C.* from Cars C inner join @possibleCars P on C.CarName = P.CarName", possibleCars).ToList();

Yes you have to use raw sql query, but if your list is really huge - you cannot do such query efficiently with Entity Framework alone.

Evk
  • 98,527
  • 8
  • 141
  • 191
  • Does retrieving with SqlQuery retain change-tracking? – SB2055 Nov 05 '16 at 20:17
  • Database.SqlQuery returns non-tracked entities. However you can use DbSet.SqlQuery which returns tracked entities (so, db.Cars.SqlQuery). See my updated answer. – Evk Nov 05 '16 at 20:23
  • @MichalCiechan we are not inserting anything here at all, what are you talking about? – Evk Nov 05 '16 at 21:33