2

I want to check if any message already exists before adding it to database, but my current query loads the entire table into memory. Query generated from my code is basically just select * from tableName.

How can I rewrite this query to be evaluated in database?

public void AddMessages(IEnumerable<Message> messages)
{
    if (messages == null)
        throw new ArgumentNullException(nameof(messages));

    var duplicates = (from currMsg in context.Messages
                      where messages.Any(msg =>
                                                msg.Prop1 == currMsg.Prop1 &&
                                                msg.Prop2 == currMsg.Prop2 &&
                                                msg.Prop3 == currMsg.Prop3)
                      select currMsg);

    var messagesWithoutDuplicates = messages.Except(duplicates);

    context.Messages.AddRange(messagesWithoutDuplicates);
    context.SaveChanges();
}

I could also run it in a loop, but then I would create many db calls instead of 1 and I would prefer to do this in a single call.

FCin
  • 3,804
  • 4
  • 20
  • 49
  • See https://stackoverflow.com/questions/36514355/applying-linq-filters-based-on-a-multi-dimensional-array/36515245#36515245, you need something similar to produce `||` based filter. – Ivan Stoev Aug 24 '18 at 08:36
  • @IvanStoev Thanks, now I'm starting to wonder if this is worth the effort. I suspect unit testing this will not be easy neither. – FCin Aug 24 '18 at 08:46
  • Since IEnumerable messages is on the client, I don't think there's a way to evaluate the .Any() on the server. If context.Messages is a huge table, is it faster to do multiple db calls and check each individual message? – linden.phoenix Aug 24 '18 at 08:46
  • Well, `||` based filters work fine with any LINQ dialect, so there should not be unit testing issue if you use LINQ to Object provider for mocking the queryable. But anyway, in general your current query is just fine. It's current EF Core query translator failure to not translate it somehow to server side evaluation. Which currently is the biggest IMO defect of EF Core. – Ivan Stoev Aug 24 '18 at 08:59
  • You shouldn't be prevented by ORM of doing things which are possible in SQL Server. `MERGE` looks like _right tool for the job_ in your case. – Fabio Aug 24 '18 at 08:59

3 Answers3

0

you can use SELECT COUNT(*) FROM TABLE if you wish to check how many rows in table. execute this query before you do your task.

or if you wish to update if row cannot be inserted ( duplicated ) you need to use merge-insert for that.

Merge Insert (MySql) => https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html

Merge Insert (Oracle) => https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm#SQLRF01606

Arphile
  • 841
  • 6
  • 18
  • I don't need number of rows. I need to check if record already exists before inserting it. If you have a solution to this, please post sample code explaining your solution. – FCin Aug 24 '18 at 08:18
  • before I answering you, I need to check if I understand correctly. a sample table has index as key. and table has 4 rows which contains 1 2 3 4 each. if you try to insert 3 into this table, dupkey will be occurred. and you wish to check if 3 is already exists. is it right? if then you need to use Merge Insert as I posted here. – Arphile Aug 24 '18 at 08:31
0

Depending on your use-case, you may need to insert them one-by-one and trust on the database unique-index (you have one, right?) to throw it back in your face if it's a duplicate.

There is two weaknesses in your code besides memory consumption: concurrency (what if somebody else inserts while you check for duplicates) and the fact that your records to insert might themselves be duplicates that you did not check for.

nvoigt
  • 75,013
  • 26
  • 93
  • 142
  • I could wrap it in transaction and lock the table. With adding many messages in a loop I might get incomplete result if there is connection error during inserts. – FCin Aug 24 '18 at 08:56
  • You could still wrap it in a transaction. But sure, if you have the chance to lock the table, that would work too. – nvoigt Aug 24 '18 at 08:57
0

Since there is no easy way of doing this in a single call I decided to sacrifice performance and retain readability and testability. This is my solution:

using (var transaction = context.Database.BeginTransaction())
{
    try
    {
        foreach (var message in messages)
        {
            var exists = context.Messages.Any(msg => msg.Prop1 == message.Prop1 &&
                                                     msg.Prop2 == message.Prop2 &&
                                                     msg.Prop3 == message.Prop3 &&);

            if (!exists)
            {
                context.Messages.Add(message);
            }
        }

        context.SaveChanges();
        transaction.Commit();
    }
    catch (Exception ex)
    {
        _logger.Error(ex);
        transaction.Rollback();
        throw;
    }
}
FCin
  • 3,804
  • 4
  • 20
  • 49