0

I only need it to work for SQL Server. This is an example. The question is about a general approach.

There is a nice extension method from https://entityframework-extensions.net called WhereBulkContains. It is, sort of, great, except that the code of the methods in this library is obfuscated and they do not produce valid SQL when .ToQueryString() is called on IQueryable<T> with these extension methods applied.

Subsequently, I can't use such methods in production code as I am not "allowed" to trust such code due to business reasons. Sure, I can write tons of tests to ensure that WhereBulkContains works as expected, except that there are some complicated cases where the performance of WhereBulkContains is well below stellar, whereas properly written SQL works in a blink of an eye. And (read above), since the code of this library is obfuscated, there is no way to figure out what's wrong there without spending a large amount of time. We would've bought the license (as this is not a freeware) if the library weren't obfuscated. All together that basically kills the library for our purposes.

This is where it gets interesting. I can easily create and populate a temporary table, e.g. (I have a table called EFAgents with an int PK called AgentId in the database):

private string GetTmpAgentSql(IEnumerable<int> agentIds) => @$"
drop table if exists #tmp_Agents;
create table #tmp_Agents (AgentId int not null, primary key clustered (AgentId asc));
{(agentIds
    .Chunk(1_000)
    .Select(e => $@"
insert into #tmp_Agents (AgentId)
values
({e.JoinStrings("), (")});
")
    .JoinStrings(""))}

select 0 as Result
";

private const string AgentSql = @"
select a.* from EFAgents a inner join #tmp_Agents t on a.AgentID = t.AgentId";    

where GetContext returns EF Core database context and JoinStrings comes from Unity.Interception.Utilities and then use it as follows:

private async Task<List<EFAgent>> GetAgents(List<int> agentIds)
{
    var tmpSql = GetTmpAgentSql(agentIds);

    using var ctx = GetContext();

    // This creates a temporary table and populates it with the ids.
    // This is a proprietary port of EF SqlQuery code, but I can post the whole thing if necessary. 
    var _ = await ctx.GetDatabase().SqlQuery<int>(tmpSql).FirstOrDefaultAsync();

    // There is a DbSet<EFAgent> called Agents.
    var query = ctx.Agents
        .FromSqlRaw(AgentSql)
        .Join(ctx.Agents, t => t.AgentId, a => a.AgentId, (t, a) => a);

    var sql = query.ToQueryString() + Environment.NewLine;

    // This should provide a valid SQL; https://entityframework-extensions.net does NOT!
    // WriteLine - writes to console or as requested. This is irrelevant to the question. 
    WriteLine(sql);

    var result = await query.ToListAsync();
    return result;
}

So, basically, I can do what I need in two steps:

using var ctx = GetContext();

// 1. Create a temp table and populate it - call GetTmpAgentSql.
...

// 2. Build the join starting from `FromSqlRaw` as in example above.

This is doable, half-manual, and it is going to work.

The question is how to do that in one step, e.g., call:

.WhereMyBulkContains(aListOfIdConstraints, whateverElseIsneeded, ...)

and that's all.

I am fine if I need to pass more than one parameter in each case in order to specify the constraints.

To clarify the reasons why do I need to go into all these troubles. We have to interact with a third party database. We don't have any control of the schema and data there. The database is large and poorly designed. That resulted in some ugly EFC LINQ queries. To remedy that, some of that ugliness was encapsulated into a method, which takes IQueryable<T> (and some more parameters) and returns IQueryable<T>. Under the hood this method calls WhereBulkContains. I need to replace this WhereBulkContains by, call it, WhereMyBulkContains, which would be able to provide correct ToQueryString representation (for debugging purposes) and be performant. The latter means that SQL should not contain in clause with hundreds (and even sometimes thousands) of elements. Using inner join with a [temp] table with a PK and having an index on the FK field seem to do the trick if I do that in pure SQL. But, ... I need to do that in C# and effectively in between two LINQ method calls. Refactoring everything is also not an option because that method is used in many places.

Thanks a lot!

  • Consider using a Table Valued Parameter, which acts like a table variable and is far more efficient than individual `INSERT` statements. – Charlieface Aug 01 '22 at 23:18
  • @Charlieface The question is NOT about performance but about how to do that in one step. – Konstantin Konstantinov Aug 01 '22 at 23:21
  • 2
    I know, I'm just saying that I think you are going down the wrong road. I don't think you need to muck around with temp tables anyway, you can just pass a TVP like any other parameter to `FromSqlRaw`. A corollary of that is: don't bother with EF and Linq in complex scenarios, just write a raw SQL query doing what you want. – Charlieface Aug 01 '22 at 23:25
  • 1
    And if you don't want to bother declaring a Table Type, you can pass the values as JSON in an NVarchar(max) parameter and read them using OPENJSON. – David Browne - Microsoft Aug 02 '22 at 00:12
  • If you do not stick with ZZZ extensions, I can propose other extension, which supports temporary tables natively and you even can create generic function, which retrieves records by id's. Anyway, for which purpose do you use the temporary tables for such select? What you are doing later with these records? – Svyatoslav Danyliv Aug 02 '22 at 11:17
  • @Charlieface I clarified the question to explain the reasons. – Konstantin Konstantinov Aug 02 '22 at 12:10
  • @SvyatoslavDanyliv I clarified the question. And I'd be glad to take a look at that other library. Thanks. – Konstantin Konstantinov Aug 02 '22 at 12:29
  • If you really want to stick with temp tables, probably best to create and insert into them separately in a function (using `SqlBulkCopy` for example), then execute your join using `FromSqlRaw` like in the below answer, then drop the temp table. – Charlieface Aug 02 '22 at 12:46

2 Answers2

3

I think you really want to use a Table Valued Parameter.

Creating an SqlParameter from an enumeration is a little fiddly, but not too difficult to get right;

CREATE TYPE [IntValue] AS TABLE (
    Id int NULL
)
private IEnumerable<SqlDataRecord> FromValues(IEnumerable<int> values)
{
    var meta = new SqlMetaData(
        "Id",
        SqlDbType.Int
    );
    foreach(var value in values)
    {
        var record  = new SqlDataRecord(
            meta
        );
        record.SetInt32(0, value);
        yield return record;
    }
}
public SqlParameter ToIntTVP(IEnumerable<int> values){
    return new SqlParameter()
    {
        TypeName = "IntValue",
        SqlDbType = SqlDbType.Structured,
        Value = FromValues(values)
    };
}

Personally I would define a query type in EF Core to represent the TVP. Then you can use raw sql to return an IQueryable.

public class IntValue
{
    public int Id { get; set; }
}
modelBuilder.Entity<IntValue>(e =>
{
    e.HasNoKey();
    e.ToView("IntValue");
});

IQueryable<IntValue> ToIntQueryable(DbContext ctx, IEnumerable<int> values)
{
    return ctx.Set<IntValue>()
        .FromSqlInterpolated($"select * from {ToIntTVP(values)}");
}

Now you can compose the rest of your query using Linq.

var ids = ToIntQueryable(ctx, agentIds);
var query = ctx.Agents
    .Where(a => ids.Any(i => i.Id == a.Id));
Jeremy Lakeman
  • 9,515
  • 25
  • 29
  • Nope. 1. `Any` is effectively a no go in EFC when large number of elements is involved because `Any` is compiled into `in` SQL clause and performance of that clause degrades substantially when the number of elements is large. Try running that for e.g. 10K elements and hundreds of millions of rows (or more) in the main table and you will see what I mean. 2. A temp table with PK and an inner join to another table by an indexed (in that other table) field is what gives decent performance (<1 second). So far nothing else did (40+ seconds to return 0 rows). – Konstantin Konstantinov Aug 02 '22 at 10:17
  • @KonstantinKonstantinov 1. Not in this case: it should be compiled into the raw `IN (select * from @p0)`, that's waht the `ToIntQueryable` does. 2. Try adding a primary key to the table type `CREATE TYPE [IntValue] AS TABLE (Id int PRIMARY KEY)` that should improve performance. Also consider enabling Trace Flag 2453, see https://www.brentozar.com/archive/2017/02/using-trace-flag-2453-improve-table-variable-performance/ – Charlieface Aug 02 '22 at 12:44
  • I tweaked your example to use PK and `Join` instead of `Any` and now it compiles into: `SELECT [e].[AgentId],... FROM [dbo].[EFAgents] AS [e] INNER JOIN (select * from @p0) AS [s] ON [e].[AgentId] = [s].[Id]'`, which is what I'd expect. Thanks! – Konstantin Konstantinov Aug 02 '22 at 13:42
  • Yeah, there's more than one way to compose the final query. The interesting trick is building the `IQueryable`. – Jeremy Lakeman Aug 03 '22 at 01:53
1

I would propose to use linq2db.EntityFrameworkCore (note that I'm one of the creators). It has built-in temporary tables support.

We can create simple and reusable function which filters records of any type:

public static class HelperMethods
{
    private class KeyHolder<T>
    {
        [PrimaryKey]
        public T Key { get; set; } = default!;
    }

    public static async Task<List<TEntity>> GetRecordsByIds<TEntity, TKey>(this IQueryable<TEntity> query, IEnumerable<TKey> ids, Expression<Func<TEntity, TKey>> keyFunc)
    {
        var ctx = LinqToDBForEFTools.GetCurrentContext(query) ??
                        throw new InvalidOperationException("Query should be EF Core query");

        // based on DbContext options, extension retrieves connection information
        using var db = ctx.CreateLinqToDbConnection();

        // create temporary table and BulkCopy records into that table
        using var tempTable = await db.CreateTempTableAsync(ids.Select(id => new KeyHolder<TKey> { Key = id }), tableName: "temporaryIds");

        var resultQuery = query.Join(tempTable, keyFunc, t => t.Key, (q, t) => q);

        // we use ToListAsyncLinqToDB to avoid collission with EF Core async methods.
        return await resultQuery.ToListAsyncLinqToDB();
    }
}

Then we can rewrite your function GetAgents to the following:

private async Task<List<EFAgent>> GetAgents(List<int> agentIds)
{
    using var ctx = GetContext();

    var result = await ctx.Agents.GetRecordsByIds(agentIds, a => a.AgentId);

    return result;
}
Svyatoslav Danyliv
  • 21,911
  • 3
  • 16
  • 32
  • What is a `KeyHolder`? The rest is clear. Thanks. – Konstantin Konstantinov Aug 02 '22 at 15:46
  • Just class, we cannot create table from just List of ints. So created intermediate class. – Svyatoslav Danyliv Aug 02 '22 at 15:50
  • @KonstantinKonstantinov, made several changes to make index in temporary table and removed disposing `DbContext`. – Svyatoslav Danyliv Aug 02 '22 at 16:10
  • The call `var ctx = LinqToDBForEFTools.GetCurrentContext(query);` along with a hack inside to get the context is what does the trick. As I cannot rely on custom LINQ methods, like `ToListAsyncLinqToDB` because `WhereBulkContains` is "sandwiched" between standard LINQ calls, I ended up creating temp table myself and then using `.FromSqlRaw(...).Join(...)`. We use custom DB context with tons of metadata attached to it in order to simplify various low level DB operations, which otherwise would require hacks. I can probably post the whole thing but that's a lot of code. – Konstantin Konstantinov Aug 03 '22 at 09:59
  • You have chosen bad ORM for your task. With `linq2db` you do not need hacks. You can express almost ANY SQL and no Raw SQL is needed. And what is `WhereBulkContains`? – Svyatoslav Danyliv Aug 03 '22 at 10:05
  • `LinqToDBForEFTools.GetCurrentContext(query)` from `linq2db ` IS using the hack described here: https://stackoverflow.com/questions/53198376/net-ef-core-2-1-get-dbcontext-from-iqueryable-argument . Anyway, thanks for your help. – Konstantin Konstantinov Aug 04 '22 at 12:08
  • Because EF team won't to support such access. I've even created issue for that. – Svyatoslav Danyliv Aug 04 '22 at 12:10