0

Let's say I have some table named foo and I want to filter out some rows based on a list of id's.

var idsList = new List<int> { 1, 4, ..., 9999 }

I can't use contains statement due to performance issues with query plan pollution context.Foos.Where(x => idsList.Contains(x.Id)) so I'm trying to solve this with advanced linq like:

from a in context.Foos
join b in idsList on a.Id equals b
select a;

or

from a in context.Foos
from b in idsList
where a.Id == b
select a;

but that does not translate.

I'm aiming to produce join between Foos and temp table of id's.

Kilas
  • 176
  • 11
  • 1
    Take a look at [this](https://stackoverflow.com/a/70765466/2501279) answer. – Guru Stron Mar 16 '23 at 09:57
  • 1
    EF Core doesn't deal with tables, it deals with entities. There's no database table to JOIN to. `due to performance issues with query plan pollution` what does this mean? You'd get a worse execution plan if you used a table-valued parameter or variable because there are statistics. `IN (...)` results in *better* execution plans because the database engine knows what data is available. – Panagiotis Kanavos Mar 16 '23 at 10:02
  • What are you actually trying to do? It matters. It's highly unusual to have 10K items in a list, much less want to use them for filtering. ORMs are meant to map graphs of objects to tables, not pull 10K rows. That's an ETL job. When you want to filter by 10K items the problem isn't query plan pollution, it's the lack of indexing. The way to solve this is to import the data into a staging/temporary table with proper indexes and execute whatever SQL you want to get the data. There are no objects involved in this – Panagiotis Kanavos Mar 16 '23 at 10:06
  • @PanagiotisKanavos Check out link in my comment - it seems there is an issue with translation of `Contains` for SQL Server which leads to query plan pollution (i.e. separate plan for every unique `idsList.Count()` if I understand correctly). – Guru Stron Mar 16 '23 at 10:06
  • @GuruStron `Contains` allows only up to 2100 items so the query would fail outright. That's not the point though - trying to create an `IN` with 2K items is almost always a sign that ORMs are misused. The other options are to use a table-valued parameter (not ideal) or staging tables. – Panagiotis Kanavos Mar 16 '23 at 10:14
  • @PanagiotisKanavos when using `{1, 2}.Contains(x)` and `{2, 3}.Contains(x)` creates two different query plans due to list values beeing translated as constants rather than parameters. I resoved this by using `LinqKit.PredicateBuilder Or(...)` inside `foreach(id in idsList)` So basically Foo is my dictionary between external `Id` and `Code` that is used in my application. Foo table has 15M records and there are cases when I need to query 2k different ids smth like : `Select * from Foos inner join Bars on Foos.Code = Bars,Code where Foos.Id in (1, 2, 3, ..., 2000)` – Kilas Mar 16 '23 at 10:23
  • As for that linked answer, it seems to be doing something similar with passing comma separated value and splitting them on the server, except it uses XML. JSON would be faster in current SQL Server versions but there are no indexes or statistics. It's an option if you can't modify the database but query performance will suffer – Panagiotis Kanavos Mar 16 '23 at 10:24
  • @Kilas I understand that but your example is an even stronger point **against** using ORMs for ETL jobs. So once again, what are you doing, and why are you using EF instead of the proper tools? A 1M row staging table is peanuts to even a laptop. An ORM starts choking even with a few hundred items. SqlBulkCopy can fill a staging table using minimal logging while ORMs can only perform fully logged operations, which is several times slower. Worse, JOINs without indexes are orders of magnitude slower – Panagiotis Kanavos Mar 16 '23 at 10:25
  • @PanagiotisKanavos legacy project – Kilas Mar 16 '23 at 10:27
  • @PanagiotisKanavos The contains statement can contain any number of items because EF translates in into an IN statement with values, not parameters (as some ORMs do). But that causes plan pollution and terrible performance when the number of items is very large. There are situation though in which there's no better option than to generate IN statements, but it should be done in chunks of some optimal size. – Gert Arnold Mar 16 '23 at 10:33
  • ORMs are newcomers so the real legacy way of doing things is faster than this. Nothing forces you to use EF either. You can use Dapper, write SQL, and use SqlBulkCopy. If all you need is load a bunch of objects, Dapper is more than enough. You can even use `STRIN_SPLIT` with a comma-separated string – Panagiotis Kanavos Mar 16 '23 at 10:35
  • I just found the [InListStringSplitCount](https://github.com/DapperLib/Dapper/blob/a31dfd3dd4d7f3f2580bd33c877199d7ef3e3ef9/Dapper/SqlMapper.Settings.cs#L95) setting in Dapper - if set and the number of items in a list is greater, instead of parameters a CSV value will be used along with STRING_SPLIT – Panagiotis Kanavos Mar 16 '23 at 10:36
  • @GertArnold I realize this but my point is that ORMs are the wrong tool for ETL and that if the OP explains the actual problem, we can find a better solution. I also thought about batching if the actual problem is how to load rows. Even though a 10K selection list is suspicious. Even assuming no TVPs/staging tables can be used, `STRING_SPLIT` or, if the data is actually a lookup, JSON can be used to pass the data. If the 10K lookup items came from the same DB, why not use a JOIN in the database? – Panagiotis Kanavos Mar 16 '23 at 10:40
  • As @PanagiotisKanavos has stated numerous times, you do not want to use an ORM to perform large updates, and are instead describing an ETL Framework. There are numerous libraries to assist with this, but ETL.NET is one particular open-source library that can leverage Entity Framework Core's connection configuration to create performant upserts with temp tables. If you want an answer involving temp tables, you'll need to declare what database engine you're using (SQL server, MySQL, etc) – Austin Arnett Mar 16 '23 at 16:12
  • I get that ORM is not designed for this but it is ORM atm. my database is SQL Server 11.0 – Kilas Mar 17 '23 at 10:45

0 Answers0