1

I want to find the duplicate values between a list and a db table with 1 db call. The list contains 1000+ rows and db table contains 100k+ rows. Is it possible?

The list structure:

public class BatchInvitationDto
{
    public List<Candidates> Candidate { get; set; }
    public string InterviewId { get; set; }
}

public class Candidates
{
    public string Name { get; set; }
    public string Email { get; set; }
}  

and the db structure:

public class Invitations
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int Id { get; set; }

    [Required(ErrorMessage = "Interview ID is required")]
    public int InterviewId { get; set; }  

    [Required(ErrorMessage = "Candidate Name is Required")]
    public string CandidateName { get; set; }

    [Required(ErrorMessage = "Candidate Email is Required")]
    public string CandidateEmail { get; set; }

    [StringLength(450)]
    [Index(IsUnique = true)]
    public string Guid { get; set; }

}  

now I have 1000+ rows in the list

BatchInvitationDto batchInvites = new BatchInvitationDto(); 
batchInvites.Candidate  // contains 1000+ rows  

and I have 100k+ rows in the db table context.Invitations. Can I check for duplicate emails (emails that exists both in the db and the list variable) with one call? Or what is the best way to do that?

Wahid Masud
  • 993
  • 10
  • 35
  • How do you define "duplicated", same InterviewId or same Name or same Mail or both? – WildWind Feb 23 '19 at 07:54
  • same email only. – Wahid Masud Feb 23 '19 at 08:02
  • It can be done with 1 db call, but will be very slow, where CandidateEmail in (1000+ records). I suggest that loop at list's email, which means 1000+ db calls, but it's much more faster. – WildWind Feb 23 '19 at 08:25
  • yes it seems really slow. i tried @rad's answer. I used `.AsEnumerable()` before `.Where()` which means the cross matching should have happened inside memory. but currently the list contains 1058 records and db contains 315 rows. and it is taking around 9 seconds every time. can you explain that? – Wahid Masud Feb 23 '19 at 08:42

1 Answers1

1

Duplicates regarding Email:

var lst = batchInvites.Candidate.Select(y => y.Email);
db.Invitations
    .Where(x => lst.Contains(x.Email))
    .ToList();
Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Amir Molaei
  • 3,700
  • 1
  • 17
  • 20
  • a small mistake. `.AsEnumerable()` is required before `Where()`. but it is really slow. – Wahid Masud Feb 23 '19 at 08:39
  • @WahidMasud: If you add AsNumerable, Although it performs the query by only one db call, it reads the entire Invitations table into the RAM which is a performance issue (That's why it's slow). However, if it's Queryable it only reads the required data and it will still use only one db call. – Amir Molaei Feb 23 '19 at 08:46
  • If I dont use `AsEnumerable()` then I get this error `Unable to create a constant value of type Application.Dto.Candidates Only primitive types or enumeration types are supported in this context` this answer explains that https://stackoverflow.com/questions/18929483/unable-to-create-a-constant-value-of-type-only-primitive-types-or-enumeration-ty – Wahid Masud Feb 23 '19 at 08:48
  • @mjwills this query puts all the 1058 emails in a `Where` clause. From the link you provided, it says max parameter number can be 2100. is this the problem you were pointing at? I tried again with 3175 emails and it works. – Wahid Masud Feb 23 '19 at 09:56
  • Awesome @WahidMasud - I am surprised it works (since there are more than 2100 parameters). But great to hear that it does! – mjwills Feb 23 '19 at 10:08
  • https://stackoverflow.com/questions/8898564/entity-framework-hitting-2100-parameter-limit – Wahid Masud Feb 23 '19 at 10:18
  • Interesting - I shall have to investigate further to see how they avoid SQL Injection issues doing it like that. – mjwills Feb 23 '19 at 11:42