3

I have a situation where my application constructs a dynamic LINQ query using PredicateBuilder based on user-specified filter criteria (aside: check out this link for the best EF PredicateBuilder implementation). The problem is that this query usually takes a long time to run and I need the results of this query to perform other queries (i.e., joining the results with other tables). If I were writing T-SQL, I'd put the results of the first query into a temporary table or a table variable and then write my other queries around that. I thought of getting a list of IDs (e.g., List<Int32> query1IDs) from the first query and then doing something like this:

var query2 = DbContext.TableName.Where(x => query1IDs.Contains(x.ID))

This will work in theory; however, the number of IDs in query1IDs can be in the hundreds or thousands (and the LINQ expression x => query1IDs.Contains(x.ID) gets translated into a T-SQL "IN" statement, which is bad for obvious reasons) and the number of rows in TableName is in the millions. Does anyone have any suggestions as to the best way to deal with this kind of situation?

Edit 1: Additional clarification as to what I'm doing.

Okay, I'm constructing my first query (query1) which just contains the IDs that I'm interested in. Basically, I'm going to use query1 to "filter" other tables. Note: I am not using a ToList() at the end of the LINQ statement---the query is not executed at this time and no results are sent to the client:

var query1 = DbContext.TableName1.Where(ComplexFilterLogic).Select(x => x.ID)

Then I take query1 and use it to filter another table (TableName2). I now put ToList() at the end of this statement because I want to execute it and bring the results to the client:

var query2 = (from a in DbContext.TableName2 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();

Then I take query1 and re-use it to filter yet another table (TableName3), execute it and bring the results to the client:

var query3 = (from a in DbContext.TableName3 join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();

I can keep doing this for as many queries as I like:

var queryN = (from a in DbContext.TableNameN join b in query1 on a.ID equals b.ID select new { b.Column1, b.column2, b.column3,...,b.columnM }).ToList();

The Problem: query1 is takes a long time to execute. When I execute query2, query3...queryN, query1 is being executed (N-1) times...this is not a very efficient way of doing things (especially since query1 isn't changing). As I said before, if I were writing T-SQL, I would put the result of query1 into a temporary table and then use that table in the subsequent queries.

Edit 2:

I'm going to give the credit for answering this question to Albin Sunnanbo for his comment:

When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).

I think that's really the best that one can do with Entity Framework. In the end, if the performance gets really bad, I'll probably go with John Wooley's suggestion:

This may be a situation where dropping to native ADO against a stored proc returning multiple results and using an internal temp table might be your best option for this operation. Use EF for the other 90% of your app.

Thanks to everyone who commented on this post...I appreciate everyone's input!

HydroPowerDeveloper
  • 3,302
  • 7
  • 31
  • 38
  • You need to avoid converting your initial IQueryable to an IEnumerable (via ToList/ToArray/AsEnumerable/etc) if you want to keep your query directly on the server. EF should compose the query correctly over joined expression trees. You can't pass more than 2000 parameters using Contains over an IEnumerable. Why isn't query1IDs IQueryable? – Jim Wooley Sep 28 '11 at 21:05
  • Hi Jim, thanks for your comments. Please refer to Edit 1 in my original question for more clarification as to what I'm trying to do. The problem isn't keeping the query on the server (I surely want to do that for the reasons that you gave)...the problem is that I don't want to keep executing query1 because it is very slow. – HydroPowerDeveloper Sep 29 '11 at 03:58
  • It might be possible to use a temporary table with Linq To Entities. I had done this will Linq2Sql, but don't know enough about Linq To Entities to say whether something similar could be done. See my reply to this question http://stackoverflow.com/questions/6122185/avoiding-the-2100-parameter-limit-in-linq-to-sql/6125328#6125328 for more details. – sgmoore Sep 29 '11 at 08:23
  • Hello sgmoore! Yes! This is the kind of thing that I'd like to do with Linq2Entities...however, I'm not sure that it is possible. I'm going to investigate this a bit further. If it is possible in Entity Framework, I think that the implementation will be a little more complicated. It sounds like I would have to edit the *.edmx file on the fly---this might not be possible. Does anyone have any ideas as to how one could implement sgmoore's idea in Linq2Entities? – HydroPowerDeveloper Sep 29 '11 at 11:16
  • The downside of @sgmoore's option is that it doesn't work in highly concurrent environments. It also requires admin (create table/delete table) permissions in the database which is often a security vulnerability. With L2S, I could recommend using a stored proc returning IMultipleResults, but EF doesn't support those yet. This may be a situation where dropping to native ADO against a stored proc returning multiple results and using an internal temp table might be your best option for this operation. Use EF for the other 90% of your app. – Jim Wooley Sep 29 '11 at 13:58
  • @Jim Wooley - I'm coming to same conclusion. Entity Framework is absolutely fantastic for a lot of (most?) things, but when performance is absolutely critial, it might become necessary to deal with the situation at a lower level (i.e., native ADO). Thanks for your comments! – HydroPowerDeveloper Sep 29 '11 at 14:47
  • @JimWooley. Either you missed the fact that this is dealing with temporary tables or you know something I don't. If the latter, then can you be more specific about what issues this should cause in highly concurrent environments and also how do you configure a user without permission to create a temporary table. – sgmoore Sep 29 '11 at 16:31

3 Answers3

2

If the size of TableName is not too big to load the whole table you use

var tableNameById = DbContext.TableName.ToDictionary(x => x.ID);

to fetch the whole table and automatically put it in a local Dictionary with ID as key.

Another way is to just "force" the LINQ evaluation with .ToList(), in the case fetch the whole table and do the Where part locally with Linq2Objects.

var query1Lookup = new Hashset<int>(query1IDs);
var query2 = DbContext.TableName.ToList().Where(x => query1IDs.Contains(x.ID));

Edit:
Storing a list of ID:s from one query in a list and use that list as filter in another query can usually be rewritten as a join.
When I had similar problems with a heavy query that I wanted to reuse in several other queries I always went back to the solution of creating a join in each query and put more effort in optimizing the query execution (mostly by tweaking my indexes).

Albin Sunnanbo
  • 46,430
  • 8
  • 69
  • 108
  • Thanks, Albin, for responding to my question. The problem is that TableName contains millions of rows and the number of IDs in query1IDs are in the hundreds, sometimes even thousands. Returning the entire TableName to the client is not an option. – HydroPowerDeveloper Sep 28 '11 at 15:23
  • **Albin:** I think that your suggestion (using a query join and tweaking the indexes) is as good as I'm going to get with Entity Framework. It's too bad that I'll have to re-execute the first query every time that I need it---if the performance isn't acceptable, I'll just have to man up and write my own T-SQL and use the usual ADO.NET objects. I'm going to keep this question open for another day just to see if there are any other ideas, but if not, I'll mark your answer as the accepted answer. Thanks again for your interest in my question! – HydroPowerDeveloper Sep 28 '11 at 20:21
  • @HydroPowerDeveloper If you have lots of joins, grouping or aggregating functions in your query you may have good luck by creating an indexed view. – Albin Sunnanbo Sep 29 '11 at 05:18
  • Unfortuantely, a view will not be of much help. The query is dynamic (determined at run-time) and is based on user inputs...I have no idea which tables will be used, how they will be joined, or what kind of filtering criteria will be specified. I think your suggestion of optimizing query execution by tweaking indexes is the best that I can do. I'm looking into a couple of other ideas (see comments by sgmoore and Anas Karkoukli), but I'm not sure that they are going to work. – HydroPowerDeveloper Sep 29 '11 at 12:12
1

Since you are running a subsequent query off the results, take your first query and use it as a View on your SQL Server, add the view to your context, and build your LINQ queries against the view.

counsellorben
  • 10,924
  • 3
  • 40
  • 38
  • Thanks for your suggestion, counsellorben! However, I can't create a static view on the SQL Server because the first query is dynamic (the query is created by users applying different filtering criteria). Besides, with a view, the T-SQL behind it will run every time it is queried...that's one of the things that I want to avoid because it takes a long time to run. – HydroPowerDeveloper Sep 28 '11 at 15:53
1

Have you considered composing your query as per this article (using the decorator design pattern):

Composed LINQ Queries using the Decorator Pattern

The premise is that, instead of enumerating your first (very constly) query, you basically use the decorator pattern to produce a chain of IQueryable that is a result of query 1 and query N. This way you always execute the filtered form of the query.

Hope this might help

Anas Karkoukli
  • 1,342
  • 8
  • 13
  • Thanks, Anas, for your suggestion and link, although I'm not certain that this will help me. As I mentioned in my edit, the problem is the multiple execution of query1...from what I understand, if I use the "decorator pattern" approach, query1 will still be executed multiple times. Am I right in that understanding? – HydroPowerDeveloper Sep 29 '11 at 04:11
  • Most welcome. In your case, should you use query wrapper, query1 will be a subquery of some sort joined into query2...N, so it will be executed in the context on a SQL join with the other queries, and not as a seperate query executed multiple times, which I guess is what you're trying to do by using the IDs returned by query1 as a filtering mechanism for the others. I do hope you find a satifactory solution soon.. good luck. – Anas Karkoukli Sep 29 '11 at 04:59
  • Anas...this is a very interesting idea. I've read some more about the "decorator pattern" and I'd be lying if I said that I fully understood it. However, from what I've read, it seems promising. I'm going to try some experiments as soon as I get into work. I'll let you know if I've been successful or not. If this does work, then I think that this is something that every EF programmer should know. Thanks again for your suggestion! – HydroPowerDeveloper Sep 29 '11 at 11:32
  • 1
    Anas...maybe I'm not understanding the "decorator pattern" correctly, but it seems that it is used for "chaining" operations. I'm not seeing how this is going to help me. I'm not chaining queries 2 to N...they completely independent of each other...they just happen to share the same "filtering table" (i.e., query1). I admit that I don't fully understand the "decorator pattern", so any enlightenment would be appreciated! =) – HydroPowerDeveloper Sep 29 '11 at 12:14