1

I have a simple linq query that gets a slug from a database for a product.

var query = from url in urlTable
            where url.ProductId == productId &&
            url.EntityName == entityName &&
            url.IsActive
            orderby url.Id descending
            select url.Slug

I am trying to optimize this, since it is run for every product and on a category page this is run x times the number of products.

I could do this (if i'm not mistaking), send in a list of products and do a new query.

var query = from url in urlTable
            where productList.Contains(url.ProductId) &&
            url.EntityName == entityName &&
            url.IsActive
            orderby url.Id descending
            select url.Slug

But I have read somewhere that the performance of Contains is bad. Is there any other way to do this? What is the best method performance wise?

Luaan
  • 62,244
  • 7
  • 97
  • 116
noshitsherlock
  • 1,103
  • 2
  • 11
  • 28
  • I think query 1 is right you can not optimize more than this – Amit Mar 04 '14 at 08:24
  • 1
    Put an index on `ProductId` with an include on the columns you really need and select only those specific columns in your query. Also, what is `productList`? Can you use a join? An alternative is to cache those slugs in, for example, a dictionary. – Silvermind Mar 04 '14 at 08:27
  • Ordering can be a performance problem. Do you need that order when you dont use the id? – stepandohnal Mar 04 '14 at 08:30
  • possible duplicate of [Why does the Contains() operator degrade Entity Framework's performance so dramatically?](http://stackoverflow.com/questions/7897630/why-does-the-contains-operator-degrade-entity-frameworks-performance-so-drama) – gdoron Mar 04 '14 at 08:32
  • If you read the question above, you will see there was indeed a problem with `contains` (probably only with huge lists with thousands of elements or even only with ten thousands), and it was fixed in EF 6. Good luck. – gdoron Mar 04 '14 at 08:34

3 Answers3

2

But I have read somewhere that the performance of Contains is bad.

I believe you're mixing this up with string.Contains, which indeed is a bad idea on large data sets, because it simply can't use any index at all.

In any case, why are you guessing on performance? You should profile and see what's better for yourself. Also, look at the SQL produced by each of the queries and look at their respective query plans.

Now, with that out of the way, the second query is better, simply because it grabs as much as it can during one query, thus removing a lot of the overhead. It isn't too noticeable if you're only querying two or three times, but once you get into say a hundred, you're in trouble. Apart from being better in the client-server communication, it's also better on the server, because it can use the index very effectively, rather than looking up X items one after another. Note that that's probably negligible for primary keys, which usually don't have a logarithmic access time.

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • 2
    [Why does the Contains() operator degrade Entity Framework's performance so dramatically?](http://stackoverflow.com/q/7897630/601179) – gdoron Mar 04 '14 at 08:41
1

The second option is better. I would add the product-id to the result so you can differentiate between products.

var query = from url in urlTable
            where productList.Contains(url.ProductId) &&
            url.IsActive
            orderby url.Id descending
            select new { ProductId, Slug }

Please note that your list of product-id's is converted to sql-parameters IN (@p1, @p2, @p3) and there is a maximum amount of sql-parameters per sql-query. I think limit is somewhere around 2000 parameter. So if you are quering for more than 2000 products, this solution will not work.

Maarten
  • 22,527
  • 3
  • 47
  • 68
-1
var query = from productId in productList
            join url in urlTable on productId equals url.ProductId
            where url.IsActive
            orderby url.Id descending
            select url.Slug;

I believe this query would have a better performance.

Shlomi Borovitz
  • 1,700
  • 9
  • 9
  • This either throws a runtime exception since you are joining an in-memory list with a sql-table, or EF pulls the complete sql-table in-memory, and joins it there, which is inefficient. – Maarten Mar 04 '14 at 08:47
  • 1
    This is of course assuming that urlTable is a sql-table, and productList is an in-memory list. The OP is not explicitly clear about this, but this is what I get from the question. – Maarten Mar 04 '14 at 08:48
  • Of course, but there is not reason to believe that products resides outside the DB while `urlTable` do. `productList` could be just a query to product table: `productTable.Select(product => product.ID)`... – Shlomi Borovitz Mar 04 '14 at 08:52
  • The question states 'send in a list of products and do a new query', which I interpreted as an in-memory list of products. But you could be right of course. – Maarten Mar 04 '14 at 08:53
  • It sounds like, when accessing a category, he queries the products under that category - and if that's the case, the join would have the best performance (assuming indexes do exists in the DB, and that `productList` is just a query to the DB). – Shlomi Borovitz Mar 04 '14 at 08:59