1

I'm trying to improve the performance of a query that uses linq and fluent API.

The query looks like this:

        var result = DbContext.Set<Parent>()
            .Join(DbContext.Set<Child>(),
                (parent) => parent.Id,
                (child) => child.ParentId,
                (parent, child) => cild)
            .Select(x => new { x.SomeOtherId, x.AnotherId })
            .Where(x => !filters.Contains(x.SomeOtherId))
            .Select(x => x.AnotherId )
            .Distinct()
            .ToListAsync(cancellationToken);

As I understand, .Contains degrades the performance of queries such as this and using a Join is better, e.g

.Join(filters, (c) => c.SomeOtherId, (filterId) => filterId, (c, filterId) => c.AnotherId);

This will give results where there is a match, however how would I find results where there is not a match?

Thanks

jarlh
  • 42,561
  • 8
  • 45
  • 63
Joshua Dunn
  • 71
  • 1
  • 6
  • 2
    ".Contains degrades the performance" Sure? Did you measure that? What **is** `filters`? If it´s a materialized `List` or similar, that should be fine. If it is a query in itself, the query will be executed again and again. – MakePeaceGreatAgain Mar 03 '20 at 12:16
  • [Except](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.except?view=netframework-4.8)? – styx Mar 03 '20 at 12:17
  • *"As I understand, .Contains degrades the performance"* have you actually profiled this to assure that this is your bottleneck? – Paul Kertscher Mar 03 '20 at 12:17
  • @HimBromBeere https://www.dotnettricks.com/learn/entityframework/tips-to-improve-entity-framework-performance point 08. filters is IEnumerable – Joshua Dunn Mar 03 '20 at 12:18
  • 1
    [Is SQL IN bad for performance?](https://stackoverflow.com/questions/1013797/is-sql-in-bad-for-performance) – xdtTransform Mar 03 '20 at 12:19
  • @Rup set list in memory - 5 - 10 records. – Joshua Dunn Mar 03 '20 at 12:23
  • not reasonable to look for alternative for such a short list. If the contains and data reach the metric ton . Then a bluck insert and a join could be a solution. If the join condition can not be retrieve from existing database in the first place. – xdtTransform Mar 03 '20 at 12:32
  • @JoshuaDunn there's nothing wrong with `Contains`. Whatever ORM you use will convert this LINQ query to a SQL statement. Frankly, if you have to use `JOIN` in an ORM query, something is wrong with the DbContext in the first place. – Panagiotis Kanavos Mar 03 '20 at 12:56
  • 4
    @JoshuaDunn `!Contains` generates a `SomeOtherId NOT IN (id1,id2,ide,...)` whose performance depends on whether **`SomeOtherId`** is indexed or not. `Contains` doesn't degrade performance. The lack of indexing does – Panagiotis Kanavos Mar 03 '20 at 12:57
  • @JoshuaDunn as for that web site link you posted, the bad English grammar and total lack of explanation should warn you that something's not quite right with that article. – Panagiotis Kanavos Mar 03 '20 at 13:01
  • @PanagiotisKanavos the article I linked states that Contains converts to WHERE IN, which can affect performance as you have said yourself. I'm interested in why using Join here is cause for concern, can you explain? – Joshua Dunn Mar 03 '20 at 14:59
  • @JoshuaDunn I said nothing of the kind. I said that this article made a blanket statement, without justification. If `SomeOtherId` is covered by an index you *won't* get bad performance due to `NOT IN`. You'd probably get bad performance because the server would still have to read eg 100K-5 rows. If the `NOT IN` filtered out eg 80 rows out of 100, the server still would use the index. – Panagiotis Kanavos Mar 03 '20 at 15:16
  • @JoshuaDunn I also said this query is really weird. ORMs generate joins based on relations. If you have to write JOINs yourself, something's wrong. In this case this entire query doesn't deal with objects at all. What are you trying to do? Perform some kind of hierarchical query? That's not what ORMs are for. It would be a *lot* easier to write this in SQL, perhaps as a view or stored procedure you can map to – Panagiotis Kanavos Mar 03 '20 at 15:21
  • For example, a simple join can only scan a single level in a hierarchy. A recursive CTE though, or a `hierarchyid` field allow you to scan an entire tree. No micro-optimization on the ORM side can give you that simplicity, maintainability and performance – Panagiotis Kanavos Mar 03 '20 at 15:23
  • @PanagiotisKanavos Apologies, what I meant was the combination of indexing and contains will cause the performance hit and in this case SomeId is not indexed so I'm expecting it will negatively impact it. I used this as example as the query is smaller but I'm querying data for reporting & visualisation, which is filtered based on IDs retrieved from this query. I use !Contains() there as well, but the dataset is potentially a lot larger, so I was wondering if there was a replacement for this syntax that I could apply there. But it seems like rewriting in raw SQL is a better option. – Joshua Dunn Mar 03 '20 at 15:32

0 Answers0