0

This seems like it would be really simple, but I cannot seem to resolve it. Using EF Core, I have a DbSet<Rule> Rules in my DbContext.

public class Rule
{
    public int Id { get; set; }
    public string Raw { get; set; }
}

I am trying to write a query where, given an IEnumerable<string> lines, give me all of the Rules from the DbSet where its Raw value is an element in lines (exact match, not a substring of a value).

For some time, I was using something like:

private IQueryable<Rule> GetExistingRules() =>
    dbContext.Rules.Where(r => lines.Contains(r.Raw));

But, I have since discovered that (I think) this was not doing what I was expecting. (This method is immediately followed by inserting new Rules for all elements of lines that do not currently exist. I was getting duplicate Rules with the same Raw value...) I think, instead, I need to use .Intersect()?

I tried using a custom EqualityComparer per this, but it throws an exception.

    private IQueryable<Rule> GetExistingRules()
    {
        var lineRules = lines.Select(l => new Rule {Raw = l});
        return dbContext.Rules.Intersect(lineRules, new RuleRawEqualityComparer());
    }

    private class RuleRawEqualityComparer : IEqualityComparer<Rule>
    {
        public bool Equals(Rule x, Rule y) => x?.Raw == y?.Raw;
        ...
    }

Could not parse expression 'value(Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1[FilterLists.Data.Entities.Rule]).Intersect(__p_0, __p_1)': This overload of the method 'System.Linq.Queryable.Intersect' is currently not supported.

What is the best way to compose this query? Note that it is in a chain of DbContext interactions, so I'd prefer to keep the return type as an IQueryable to enable EF's lazy query composition.

Context on GitHub

Update: More info on why I suspected the Contains() approach was not working:

This is the class where the query is being used. I am seeing exceptions like below because the Raw column in the database has a unique constraint. I thought my logic (use of Except() in CreateNewRules()) would prevent any rows in Rules with duplicate Raw values, but maybe my issue lies elsewhere...

public class SnapshotBatch
{
    private readonly FilterListsDbContext dbContext;
    private readonly IEnumerable<string> lines;
    private readonly Data.Entities.Snapshot snapEntity;

    public SnapshotBatch(FilterListsDbContext dbContext, IEnumerable<string> lines,
        Data.Entities.Snapshot snapEntity)
    {
        this.dbContext = dbContext;
        this.lines = lines;
        this.snapEntity = snapEntity;
    }

    public async Task SaveAsync()
    {
        var existingRules = GetExistingRules();
        var newRules = CreateNewRules(existingRules);
        dbContext.Rules.AddRange(newRules);
        var rules = existingRules.Concat(newRules);
        AddSnapshotRules(rules);
        await dbContext.SaveChangesAsync();
    }

    private IQueryable<Rule> GetExistingRules() =>
        dbContext.Rules.Where(r => lines.Contains(r.Raw));

    private List<Rule> CreateNewRules(IQueryable<Rule> existingRules) =>
        lines.Except(existingRules.Select(r => r.Raw))
             .Select(r => new Rule {Raw = r})
             .ToList();

    private void AddSnapshotRules(IQueryable<Rule> rules) =>
        snapEntity.AddedSnapshotRules
                  .AddRange(rules.Select(r => new SnapshotRule {Rule = r}));
}

Snippet from exception StackTrace (where '###Meebo:AdElement.Root' is a sample value for Raw in the Rules table):

FilterLists.Services.Snapshot.Snapshot.TrySaveAsync() in /home/travis/build/collinbarrett/FilterLists/src/FilterLists.Services/Snapshot/Snapshot.cs:line 43 Duplicate entry '###Meebo:AdElement.Root' for key 'IX_rules_Raw' at MySql.Data.MySqlClient.MySqlDataReader.ActivateResultSet(ResultSet resultSet) in C:\projects\mysqlconnector\src\MySqlConnector\MySql.Data.MySqlClient\MySqlDataReader.cs:line 93 at

Update 2: I am fairly sure the issue I was seeing with Contains() was due to this issue that has an active PR. Because my strings have all kinds of special characters, I think they were not getting escaped properly when passed into Contains(), but they seem to be properly escaped as parameters in a Join().

Collin Barrett
  • 2,441
  • 5
  • 32
  • 53
  • Can you explain again what's the problem with `Contains` approach (because it seem to be the correct way of getting existing rules from db)? – Ivan Stoev Aug 20 '18 at 12:43
  • @IvanStoev See Update at the bottom of the question. It could be I'm asking the completely wrong question... I'm not sure. – Collin Barrett Aug 20 '18 at 13:03
  • 1
    The exception seem to be generated by a different code (`Snapshot.TrySaveAsync`). Most likely the problem is caused by `snapEntity.AddedSnapshotRules.Add[Range]` - in case `snapEntity` is attached to a different context, it is possible that change tracker of that context can consider `Rule` objects being new. – Ivan Stoev Aug 20 '18 at 13:26
  • `SnapshotBatch` uses the same `DbContext` instance as its caller `Snapshot.TrySaveAsync()`. However, since `Snapshot` spins up multiple instances of `SnapshotBatch`, I wonder if there are conflicts due to everything being async... – Collin Barrett Aug 20 '18 at 13:32
  • 1
    Note that the comparison in the database is probably case-insensitive. So if in the database you have rule `A` and `lines` contains `a` and `A` (or `a` only), you will get existing rules: `A`, lines except `A` is `a`, because C# is case-sensitive, and you'll try to insert `a`, which is a duplicate for the database. – Gert Arnold Aug 20 '18 at 14:49
  • Thanks, @GertArnold . That's a good thought. I'm a bit out of the mainstream using MySQL with EF, I know, but I thought since my table uses `COLLATE=utf8mb4_bin` that my queries would be case-sensitive. But, maybe that is not the case? Will try some testing on that... – Collin Barrett Aug 20 '18 at 14:52
  • All, see "Update 2" in question. I think this whole issue had to do with character escaping. Still validating, but that theory seems initially to have been the issue. – Collin Barrett Aug 20 '18 at 20:37

2 Answers2

2

Do not forget that when you use linQ with EFCore and IQueryable, it translate the c# code in Sql statements.

Did you tried this ?

var query = from rule in dbContext.Rules
            join line in lines
                on rule.Raw equals line
            select rule;
Collin Barrett
  • 2,441
  • 5
  • 32
  • 53
Zysce
  • 1,200
  • 1
  • 10
  • 35
  • Initial testing seems to show this working better than `.Contains()`, going to do more full testing which will take some time. I'm using the R#-converted method chain syntax `dbContext.Rules.Join(lines, rule => rule.Raw, line => line, (rule, line) => rule)`, but should be equivalent. – Collin Barrett Aug 20 '18 at 13:30
  • Wondering if `Join()` will work better than `Contains()` since I am dealing with large datasets per [this](https://stackoverflow.com/questions/40650720/performance-join-vs-contains-linq-to-entities#comment68534417_40651241)... – Collin Barrett Aug 20 '18 at 14:03
  • Though `Contains()` should work as well, marking this as accepted as a valid alternative that I am using at least until the issue mentioned in `Update 2` in my question is resolved. Thanks! – Collin Barrett Aug 20 '18 at 20:38
2

You wrote:

Give me all of the Rules from the DbSet where its Raw value is an element in lines (exact match etc.)

Your first solution will give the desired result:

var requestedRules = dbContext.Rules
   .Where(rule => lines.Contains(rule));

In words: from the collection of Rules, select only those Rules that have a Raw value that equals one of the values in the sequence of lines.

I was getting duplicate Rules with the same Raw value...)

Well apparently your source collection has Rules with the same Raw value!

If you want only unique Raw values you'll have to decide what to do with duplicates:

Id  Raw
 1  "Hello"
 2  "Hello"

Which one do you want? The first? the last? Both? None? Any?

Let's go for Any: we'll make groups of Rules with same Raw value, and from every Group we take the first one (or if you want the last, after all we don't care. But that is a bit less efficient.

var result = dbContext.Rules
   .Where(rule => lines.Contains(rule))
   .GroupBy(rule => rule.Raw)
   .Select(group => group.FirstOrDefault());

In words: from the collection of Rules, select only those Rules that have a Raw value that equals one of the values in the sequence of lines. From the remaining elements make groups of Rules with same Raw value. Then from every Group take any element, for example the first one.

If you want all / only the first / only the last, you'll know what to do by now.

Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116
  • Thanks, Harald. Under "Update" in my question, I mentioned that "the `Raw` column in the database has a unique constraint". So, there aren't duplicate `Raw` values already in the database (as evidenced by the exception at the bottom of my question). I suspect that my issue has something to do with multiple instances of `SnapshotBatch` sharing `dbContext` running in `async` per [this](https://stackoverflow.com/questions/51930802/get-all-entities-from-dbsettentity-with-a-property-value-that-is-in-an-ienumer#comment90813655_51930802). – Collin Barrett Aug 20 '18 at 14:24