I only need it to work for SQL Server. This is an example. The question is about a general approach.
There is a nice extension method from https://entityframework-extensions.net called WhereBulkContains
. It is, sort of, great, except that the code of the methods in this library is obfuscated and they do not produce valid SQL when .ToQueryString()
is called on IQueryable<T>
with these extension methods applied.
Subsequently, I can't use such methods in production code as I am not "allowed" to trust such code due to business reasons. Sure, I can write tons of tests to ensure that WhereBulkContains
works as expected, except that there are some complicated cases where the performance of WhereBulkContains
is well below stellar, whereas properly written SQL works in a blink of an eye. And (read above), since the code of this library is obfuscated, there is no way to figure out what's wrong there without spending a large amount of time. We would've bought the license (as this is not a freeware) if the library weren't obfuscated. All together that basically kills the library for our purposes.
This is where it gets interesting. I can easily create and populate a temporary table, e.g. (I have a table called EFAgents
with an int PK called AgentId
in the database):
private string GetTmpAgentSql(IEnumerable<int> agentIds) => @$"
drop table if exists #tmp_Agents;
create table #tmp_Agents (AgentId int not null, primary key clustered (AgentId asc));
{(agentIds
.Chunk(1_000)
.Select(e => $@"
insert into #tmp_Agents (AgentId)
values
({e.JoinStrings("), (")});
")
.JoinStrings(""))}
select 0 as Result
";
private const string AgentSql = @"
select a.* from EFAgents a inner join #tmp_Agents t on a.AgentID = t.AgentId";
where GetContext
returns EF Core database context and JoinStrings
comes from Unity.Interception.Utilities
and then use it as follows:
private async Task<List<EFAgent>> GetAgents(List<int> agentIds)
{
var tmpSql = GetTmpAgentSql(agentIds);
using var ctx = GetContext();
// This creates a temporary table and populates it with the ids.
// This is a proprietary port of EF SqlQuery code, but I can post the whole thing if necessary.
var _ = await ctx.GetDatabase().SqlQuery<int>(tmpSql).FirstOrDefaultAsync();
// There is a DbSet<EFAgent> called Agents.
var query = ctx.Agents
.FromSqlRaw(AgentSql)
.Join(ctx.Agents, t => t.AgentId, a => a.AgentId, (t, a) => a);
var sql = query.ToQueryString() + Environment.NewLine;
// This should provide a valid SQL; https://entityframework-extensions.net does NOT!
// WriteLine - writes to console or as requested. This is irrelevant to the question.
WriteLine(sql);
var result = await query.ToListAsync();
return result;
}
So, basically, I can do what I need in two steps:
using var ctx = GetContext();
// 1. Create a temp table and populate it - call GetTmpAgentSql.
...
// 2. Build the join starting from `FromSqlRaw` as in example above.
This is doable, half-manual, and it is going to work.
The question is how to do that in one step, e.g., call:
.WhereMyBulkContains(aListOfIdConstraints, whateverElseIsneeded, ...)
and that's all.
I am fine if I need to pass more than one parameter in each case in order to specify the constraints.
To clarify the reasons why do I need to go into all these troubles. We have to interact with a third party database. We don't have any control of the schema and data there. The database is large and poorly designed. That resulted in some ugly EFC LINQ queries. To remedy that, some of that ugliness was encapsulated into a method, which takes IQueryable<T>
(and some more parameters) and returns IQueryable<T>
. Under the hood this method calls WhereBulkContains
. I need to replace this WhereBulkContains
by, call it, WhereMyBulkContains
, which would be able to provide correct ToQueryString
representation (for debugging purposes) and be performant. The latter means that SQL should not contain in
clause with hundreds (and even sometimes thousands) of elements. Using inner join
with a [temp] table with a PK and having an index on the FK field seem to do the trick if I do that in pure SQL. But, ... I need to do that in C# and effectively in between two LINQ method calls. Refactoring everything is also not an option because that method is used in many places.
Thanks a lot!