1

I need to do a query on my database that might be something like this where there could realistically be 100 or more search terms.

public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
    IQueryable<Address> addressQuery = DbContext.Addresses;
    addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
    return addressQuery;
}

However when it contains more than about 15 terms it throws and exception on execution because the SQL generated is too long.

Can this kind of query be done through Entity Framework?

What other options are there available to complete a query like this?

Neil
  • 5,179
  • 8
  • 48
  • 87

3 Answers3

3

Sorry, are we talking about THIS EXACT SQL?

In that case it is a very simple "open your eyes thing".

There is a way (contains) to map that string into an IN Clause, that results in ONE sql condition (town in ('','',''))

Let me see whether I get this right:

addressQuery.Where( x => towns.Any( y=> x.Town == y ) );

should be

addressQuery.Where ( x => towns.Contains (x.Town)

The resulting SQL will be a LOT smaller. 100 items is still taxing it - I would dare saying you may have a db or app design issue here and that requires a business side analysis, I have not me this requirement in 20 years I work with databases.

TomTom
  • 61,059
  • 10
  • 88
  • 148
  • +1 This definitely helps; the generated sql still isn't very good, but it's much, much better than before; and it's also a very simple change. – Eamon Nerbonne Jan 19 '13 at 13:03
  • Imagine a situation where you have a list of devices and you want to deliver content to those devices. The client asks for specific towns where they have stores and want to target those towns. Nationwide they might have 300 stores hence you want all devices that are located in the towns with stores. – Neil Jan 19 '13 at 13:07
  • ...or really any aggregate set query in which one of the sets isn't already in the database. – Eamon Nerbonne Jan 19 '13 at 13:13
  • 1
    It's also worth noting that this solution can heavily pollute the query plan cache of your server, which may matter as queries grow more complex. – Eamon Nerbonne Jan 19 '13 at 13:16
  • @Neil No client will have this "on demand". Stuff like this sort of grouping is often done in an editor (and then kept in the database). You are better off, strategically, to find a structure where you can use a database side join instead of injecting the list of towns into the SQL, especially when it gets larger. – TomTom Jan 19 '13 at 13:37
  • 1
    @EamonNerbonne Yes, correct. Let's bother about a little query plan pollution instead of using nice SQL with a decent performance. Query plan pollution is normally not THAT much a problem. On top, it is not lke the SQL Server team is full of idiots - they are quite nice in optimizing that. I call you on premature optimization. – TomTom Jan 19 '13 at 13:39
  • @TomTom: In my experience, you'll need bulk-insert or TVPs sometime anyway, and if you have the conding-infrastructure for that in place, it's the most appropriate solution to this problem; precisely because it allows a database-side join rather than injecting a list of towns into the sql. In any case, I'm not suggesting that this solution is bad - I think it's a *good* solution. I just don't think that your out-of-hand dismissal of TVP's or bulk insert makes any sense. – Eamon Nerbonne Jan 19 '13 at 16:02
  • On the topic of premature optimization: of *course* it's premature optimization in an application that consists solely of this one query in which no more than 100 towns can be passed. Whether that's the case in any particular application will depend on the specifics. Having encountered exactly this problem in practice I can assure you it's not theoretical. (Also, you seem to suggest that queries necessarily cannot share the same query plan if they are to be "nice SQL with a decent performance": but that's not the case). – Eamon Nerbonne Jan 19 '13 at 16:10
  • I'll give this a try on Monday, thanks. Would this kind of thing be better being done on a NoSql database on a map reduce or something like that? I can utilize couchbase if I want and cache all the addresses and then perform queries on the cache – Neil Jan 19 '13 at 18:13
  • Thanks, this has got the query working and running a heck of a lot faster. – Neil Jan 21 '13 at 12:19
1

This looks like a scenario where you'd want to use the PredicateBuilder as this will help you create an Or based predicate and construct your dynamic lambda expression.

This is part of a library called LinqKit by Joseph Albahari who created LinqPad.

 public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
  var predicate = PredicateBuilder.False<Address>();

  foreach (string town in towns)
  {
    string temp = town;
    predicate = predicate.Or (p => p.Town.Equals(temp));
  }

  return DbContext.Addresses.Where (predicate);
}
scartag
  • 17,548
  • 3
  • 48
  • 52
0

You've broadly got two options:

  • You can replace .Any with a .Contains alternative.
  • You can use plain SQL with table-valued-parameters.

Using .Contains is easier to implement and will help performance because it translated to an inline sql IN clause; so 100 towns shouldn't be a problem. However, it also means that the exact sql depends on the exact number of towns: you're forcing sql-server to recompile the query for each number of towns. These recompilations can be expensive when the query is complex; and they can evict other query plans from the cache as well.

Using table-valued-parameters is the more general solution, but it's more work to implement, particularly because it means you'll need to write the SQL query yourself and cannot rely on the entity framework. (Using ObjectContext.Translate you can still unpack the query results into strongly-typed objects, despite writing sql). Unfortunately, you cannot use the entity framework yet to pass a lot of data to sql server efficiently. The entity framework doesn't support table-valued-parameters, nor temporary tables (it's a commonly requested feature, however).

A bit of TVP sql would look like this select ... from ... join @townTableArg townArg on townArg.town = address.town or select ... from ... where address.town in (select town from @townTableArg).

You probably can work around the EF restriction, but it's not going to be fast and will probably be tricky. A workaround would be to insert your values into some intermediate table, then join with that - that's still 100 inserts, but those are separate statements. If a future version of EF supports batch CUD statements, this might actually work reasonably.

Almost equivalent to table-valued paramters would be to bulk-insert into a temporary table and join with that in your query. Mostly that just means you're table name will start with '#' rather than '@' :-). The temp table has a little more overhead, but you can put indexes on it and in some cases that means the subsequent query will be much faster (for really huge data-quantities).

Unfortunately, using either temporary tables or bulk insert from C# is a hassle. The simplest solution here is to make a DataTable; this can be passed to either. However, datatables are relatively slow; the over might be relevant once you start adding millions of rows. The fastest (general) solution is to implement a custom IDataReader, almost as fast is an IEnumerable<SqlDataRecord>.

By the way, to use a table-valued-parameter, the shape ("type") of the table parameter needs to be declared on the server; if you use a temporary table you'll need to create it too.

Some pointers to get you started:

Community
  • 1
  • 1
Eamon Nerbonne
  • 47,023
  • 20
  • 101
  • 166
  • -1. He is not using entity framework to pass a lot of data to start with. Funnny how everyone throws archtiectural advice around and totally ignores that the particular LINQ statement is not what you should touch in SQL for a simple IN clause. – TomTom Jan 19 '13 at 13:05
  • I'm surprised you think so; but to each there own. It's not a matter of architecture, it's just that the SQL IN statement applied to explicit scalar parameters is a poor match for a variable-sized set. Unfortunately, the alternatives sql server provides aren't supported by the entity framework. – Eamon Nerbonne Jan 19 '13 at 13:09
  • @TomTom: could you explain "the particular LINQ statement is not what you should touch in SQL for a simple IN clause" - I don't understand how this related to my answer. – Eamon Nerbonne Jan 19 '13 at 13:18
  • are you joking? His example is "find where town is one of a list of 100". Any sensible person would use an IN statement in SQL (town IN [list of towns]) instead of 100 "town like x" statements. THIS is his problem, as per example and I was right as per his comment on my answer. – TomTom Jan 19 '13 at 13:36
  • Of course - but the question is whether you do `IN (...variable-length-param-list...)` or `IN (select * from @tablevar)` - and the latter is much faster, and allows for more input, and is designed for precisely that purpose. Your suggestion to use 100 "town like x" statements has nothing to do with this answer. If you want these kind of set queries commonly then I stand by the TVP or bulk-insert approaches; these are the only solutions that really work well. – Eamon Nerbonne Jan 19 '13 at 15:53