1

I have a complex IQueryable that I would like EF to populate with a single database query so that I can use it with deferred execution. Please consider the following example.

For these models:

public enum AlphaState { Unknown = '\0', Good = 'G', Bad = 'B' }

[Table("MY_ALPHA")]
public class Alpha
{
    [Key]
    [Column("alpha_index")]
    public long Index { get; set; }

    [Column("alpha_id")] // user-editable field that users call ID
    public string AlphaId { get; set; }

    [Column("deleted")]
    public char? Deleted { get; set; }

    [Column("state")]
    public AlphaState State { get; set; }

    [InverseProperty("Alpha")]
    public ICollection<Bravo> Bravos { get; set; }
}

[Table("MY_BRAVO")]
public class Bravo
{
    [Key]
    [Column("bravo_index")]
    public long BravoIndex { get; set; }

    [ForeignKey("Alpha")]
    [Column("alpha_index")] // actually a 1:0..1 relationship
    public long? AlphaIndex { get; set; }
    public virtual Alpha Alpha { get; set; }

    [InverseProperty("Bravo")]
    public ICollection<Charlie> Charlies { get; set; }
}

[Table("MY_CHARLIE_VIEW")]
public class Charlie
{
    [Key]
    [Column("charlie_index")]
    public int CharlieIndex { get; set; }

    [Column("deleted")]
    public char? Deleted { get; set; }

    [Column("created_at")]
    public DateTime CreatedAt { get; set; }

    [ForeignKey("Bravo")]
    [Column("bravo_index")]
    public long BravoIndex { get; set; }
    public virtual Bravo Bravo { get; set; }

    [ForeignKey("Delta")]
    [Column("delta_index")]
    public long DeltaIndex { get; set; }
    public virtual Delta Delta { get; set; }

    [InverseProperty("Charlie")]
    public virtual ICollection<Delta> AllDeltas { get; set; }
}

[Table("MY_DELTA")]
public class Delta
{
    [Key]
    [Column("delta_index")]
    public long DeltaIndex { get; set; }

    [ForeignKey("Charlie")]
    [Column("charlie_index")]
    public long CharlieIndex { get; set; }
    public virtual Charlie Charlie { get; set; }

    [InverseProperty("Delta")] // actually a 1:0..1 relationship
    public ICollection<Echo> Echoes { get; set; }
}

public enum EchoType { Unknown = 0, One = 1, Two = 2, Three = 3 }

[Table("MY_ECHOES")]
public class Echo
{
    [Key]
    [Column("echo_index")]
    public int EchoIndex { get; set; }

    [Column("echo_type")]
    public EchoType Type { get; set; }

    [ForeignKey("Delta")]
    [Column("delta_index")]
    public long DeltaIndex { get; set; }
    public virtual Delta Delta { get; set; }
}

...consider this query:

IQueryable<Alpha> result = context.Alphas.Where(a => a.State == AlphaState.Good)
                                         .Where(a => !a.Deleted.HasValue)
                                         .Where(a => a.Bravos.SelectMany(b => b.Charlies)
                                                             .Where(c => !c.Deleted.HasValue)
                                                             .Where(c => c.Delta.Echoes.Any())
                                                             .OrderByDescending(c => c.CreatedAt).Take(1)
                                                             .Any(c => c.Delta.Echoes.Any(e => e.Type == EchoType.Two)))
var query = result as System.Data.Objects.ObjectQuery;
string queryString = query.ToTraceString();

NOTE: Charlie is actually a view on a table; Delta has an FK to Charlie's table, but the view gives a fake FK for the most recent Delta linked to that Charlie, so the model uses that because the plan is to use EF only for querying, never for updating.

I would like this query to be populated by a single database query, but as written that's not what's happening. How can I modify this query to get the same results but have EF simply build the condition into the results IQueryable instead of pre-fetching data for it?

How I Know It's Using Two Queries

I know for sure that it's splitting into multiple queries because, for reasons beyond the scope of this question, I deliberately gave the context a bad connection string. result is an IQueryable, so it should be using deferred execution and not actually attempt to retrieve any data until it's used, but I'm getting a connection failed exception as soon as I declare it.

Background

We have an existing database structure, database access layer, and several hundred thousand lines of code using said structure & DAL. We'd like to add a UI to allow users to build complex queries of their own, and EF seemed like a good way to build an underlying model for that. We've never used EF before, though, so the Powers That Be have declared that it cannot ever connect to the database; we should use EF to generate an IQueryable, extract the query string from it, and use our existing DAL to run the query.

Oblivious Sage
  • 3,326
  • 6
  • 37
  • 58
  • 2
    You might want to check your assumption using: http://stackoverflow.com/a/20751723/4812586 or https://msdn.microsoft.com/en-us/data/dn469464.aspx – jjj Jun 04 '15 at 19:14
  • can you do that in SQL without sub query? – Senad Meškin Jun 04 '15 at 19:18
  • @SenadMeškin The given SQL query accomplishes the same thing without a subquery. I'm actually totally OK with EF using a subquery, I just want the subquery to happen in the same statement at the same time. – Oblivious Sage Jun 04 '15 at 19:22
  • Note, that the lambda passed to Where is an expression tree. EF can interpret it and it fact it does. This works just fine. – usr Jun 04 '15 at 19:32
  • Then you don't need any – Senad Meškin Jun 04 '15 at 19:37
  • 1
    I believe that EF will connect to the database when you access the `ObjectQuery` property. It does this, not to execute the query, but to determine what version of SQL Server you're using so as to be able to supply the `ProviderManifestToken` value for the entity connection string. This is a hangover from the early days of EF. – Olly Jun 04 '15 at 20:46
  • @Olly I don't think I'm accessing an `ObjectQuery` property; I'm just casting it to an `ObjectQuery` and calling `ToTraceString()`. – Oblivious Sage Jun 04 '15 at 20:47
  • 2
    @ObliviousSage investigate this. Make the connection valid so that you don't get an exception. Then use SQL Profiler to see what is being executed. Maybe it's just one real query. – usr Jun 04 '15 at 20:49
  • EF generates different models internally for SQL 2005 and SQL 2008 databases. It makes a connection and queries the server's metadata to determine what features it can use in its query definition. It can't output a SQL trace string, or even generate a query, without knowing this. It can't know this without connecting. Use a profiler, as per the suggestion from @usr, to observe this. – Olly Jun 04 '15 at 20:50
  • 1
    By default EF won't translate your `IQueryable<>` to a provider query immediately. However, various methods force it to do this, including any trace method that outputs T-SQL. – Olly Jun 04 '15 at 20:53
  • @Olly That wouldn't surprise me, but as I said (somewhere in the depths of the question), I'm getting a connection failed exception on the line where I declare `result`. It can't be part of the `ToTraceString()` call because it's not even making it to that point. – Oblivious Sage Jun 04 '15 at 20:53
  • @ObliviousSage I just ran your code with a profiler attached. It took a bit of tweaking to make it run, but the assignment line *does* open the connection, but does *not* retrieve any results. The query is, as expected, executed lazily. (I had to make sure that database initialization is disabled otherwise a whole boatload of code runs on that first line to initialize the db.) – Olly Jun 04 '15 at 21:08
  • I've never seen EF running 2 SQL queries in the same "ToList" method (different is for DML and DDL). SQL Profiler, like already suggested, is the best way to check it – bubi Jun 05 '15 at 07:29

3 Answers3

3

How I Know It's Using Two Queries

What you're observing isn't EF starting to run your query. After you assign a query to your result variable, you still have a query definition, not a result set. If you attach a profiler to your database, you'll see that no SELECT statement has executed for your query.

So, why is as connection being made to the database? The reason is that the first time you build a query for a given derived DbContext type, EF builds and caches its in-memory model for that type. It does this by applying various conventions to the types, properties and attributes that you've defined. In theory this process doesn't need to connect to the database, but the provider for SQL Server does so anyway. It does this to determine the version of SQL Server you're using so it can work out whether it can use more recent SQL Server features in the model that it builds.

It's interesting that this model is cached for the type, not the instance of the context. You can see this by disposing of your context then creating a new one and repeating the lines of code that build a query. In the second instance you wouldn't see a connection to the database at all, since EF would use its cached model for your context type.

the Powers That Be have declared that it cannot ever connect to the database; we should use EF to generate an IQueryable, extract the query string from it, and use our existing DAL to run the query.

Since you need to avoid having EF connect to the database at all, you can see my post here which contains details of how you can provide this information up-front in code instead.

Also, be aware that there's another reason why EF might connect to your server the first time it encounters your DbContext type: initialization. Unless you've disabled initialization (with something like Database.SetInitializer<MyContext>(null)) it will check that the database exists and try to create it if not.

Note that you can call ToString() directly on an EF code first query to get the T-SQL query. You don't need to go through the intermediate ObjectQuery.ToTraceString() method, which is actually part of the legacy EF API. However, both of these methods are intended for debugging and logging purposes. It is rather unusual to use EF for building queries but not executing them. You are likely to encounter problems with this approach—most obviously when EF determines that it should generate a parameterized query. Also, there's no guarantee that different versions of EF will generate similar T-SQL for the same input, so your code may end up being rather brittle as you update to newer EF versions. Make sure you have plenty of tests!

If you are concerned about having users connect to the database directly—and that's an entirely legitimate security concern—you might consider an alternative approach. I haven't much experience with it, but it seems that OData might be a good fit. This allows you to build queries on a client, serialize them over a remote connection, then re-create them on your server. Then, on the server, you can execute them against your database. Your client needs to know nothing about the database whatsoever.

If you do decide (or are instructed) to persevere with the approach you detailed in your question, do spend the time learning how to profile a SQL Server connection. This will be an absolutely essential tool for you in working out how EF is translating your queries.

Community
  • 1
  • 1
Olly
  • 5,966
  • 31
  • 60
2
context.Foos.Where(f => f.Bars.Any(b => b.SomeOtherData == "baz"));

I tried a query similar to yours on a database that I have (using LINQPad), and I ended up with

SELECT 
  [Extent1].[Property1] AS [Property1]
  -- other properties of Foo
  FROM [dbo].[Foos] AS [Extent1]
  WHERE EXISTS (SELECT 
    1 AS [C1]
    FROM [dbo].[Bars] AS [Extent2]
    WHERE ([Extent1].[FooId] = [Extent2].[FooId]) AND (N'baz' = [Extent2].[SomeOtherData])
  )

...which definitely looks like a single query to me.

Expressions within IQueryable functions don't get executed directly -- they're used to generate SQL, which is then used to execute the query when the results need to be materialized.

jjj
  • 4,822
  • 1
  • 16
  • 39
  • Hmmmm. My *actual* query is significantly more complicated than the example I used; I thought I could simplify the 5 models involved down to 2 and preserve the behavior I was getting, but maybe not. Let me edit my question. – Oblivious Sage Jun 04 '15 at 19:33
  • I've edited my question to give an example that's much closer to my actual query. – Oblivious Sage Jun 04 '15 at 20:10
-1

Try using LINQ query syntax.

var result = (
    from f in foo
    join b in bar
        on f.fooid equals b.fooid
    where b.someotherdata = "baz"
    select new { f.fooid, f.somedata }
).Distinct().ToEnumerable();

This will be deferred until enumerated.

Nick Vaccaro
  • 5,428
  • 6
  • 38
  • 60
  • Just hopped on by to see why I got a downvote. To my surprise, the question went though so many edits, it would be impossible to determine its initial state without going back through the history. As such, it's hard to tell what question I actually tried to answer! – Nick Vaccaro Jun 08 '15 at 13:31