1

Using a function to return a list of Contacts based on a List of ID values

The parameter is a list of distinct integer values / contact IDs and size is varied from smaller numbers to bigger numbers [ usually in range of 1000 - 100000]

Approximately 500000 records in Contacts table and list is growing

At present this query is taking ~10 - 15 seconds for the completion

    public void FetchFromIDList(List<int> contactIds)
    {
        try
        {

            IQueryable<Contact> query = context.Contacts
                                    .Include(c => c.Company)
                                    .Include(c => c.Addresses)
                                        .ThenInclude(a => a.Country);

            query = query.Where(c => contactIds.Contains(c.Id)).Take(10);
        }
        catch{}
    }

Is there any way to improve perfomance of Contains ?

After some research i tries using Hashset , but not much difference on perfomace

         //https://stackoverflow.com/questions/18651940/performance-benchmarking-of-contains-exists-and-any
         HashSet<int> setIds = new HashSet<int>(contactIds);

Some articles mentioned about using temp table and insert to temp table and do a join , but i am wondering will it be extra works to insert first and then select and i am using Entity Framework

Is there any other approaches i can try ? I am using Server side pagination and usually page size is 10 and that is why using Take(10)

Sebastian
  • 4,625
  • 17
  • 76
  • 145

1 Answers1

1

There also is a maximum number of items you can pass for EF to compose an IN() clause from to consider as well. Contains is fine for smaller sets, but I would explore the underlying reason why such a method would be called with 1000's or more Ids.

Chunking large sets is more around getting around the size limit, performance will still stink. If this is something like a batch processing op I would look at importing the Ids /w a session Id token into a staging table and joining that to the Contact.

Code like this:

query = query.Where(c => contactIds.Contains(c.Id)).Take(10);

should also use an OrderBy clause, and would need to support Skip as well as Take to perform pagination, otherwise you're passing 1000+ IDs in just to take 10 results.

A simple solution to this specific problem is that since you are passing IDs in and you want to paginate your results, paginate your ID list first:

var pageOfIds = contactIds
    .OrderBy(x => x)
    .Skip(pageNumber * pageSize)
    .Take(pageSize);

var contacts = context.Contacts
    .Include(c => c.Company)
    .Include(c => c.Addresses)
        .ThenInclude(a => a.Country)
    .Where(c => contactIds.Contains(c.Id))
    .ToList();

This way if your page size is 10 you are always only fetching 10 entities. This assumes that the Contact IDs passed in are all valid, existing Contact IDs. You could get less than the expected page size if the list can contain garbage.

If this is pre-filtering to get contact Ids and the result of some code separated into silos, such as Generic Respositories where you have one set of code querying one or more entities to gather ContactIds, then going to a ContactRepository to fetch those contacts by Id, this would be better handled by projecting down to the Contacts using Select or SelectMany and applying your pagination. It would really depend on seeing the real code rather than the provided example.

Steve Py
  • 26,149
  • 3
  • 25
  • 43
  • In my case all Id values are valid I'd values from database which is stored as a part of some previous operations. I will try to add order by and skip in combination with take and might be performance improve . – Sebastian Dec 23 '21 at 21:46
  • 1
    Doing the pagination on the list (or passing in just a page worth of IDs) will improve the performance considerably since the IN clause would just have the page worth of IDs. `OrderBy` is required to ensure the values are repeatable. EF does not apply any default sorting. – Steve Py Dec 23 '21 at 21:56