Reduce the number of common list property comparison in .NET Core

Question

I have two tables in one-to-many relationship, that in this example are represented by Customers(one) and CustomersInfo(many).

using System.Collections.Generic;

namespace Domain.Entities
{
    public class CustomersAggregate
    {
        public CustomersAggregate(int id, string name)
        {
            Id = id;
            Name = name;
        }

        public int Id { get; set; }
        public string Name { get; set; }
        public IEnumerable<CustomersInfo> CustomersInfo { get; set; }
    }

    public class CustomersInfo
    {
        public CustomersInfo(int customerId, string email)
        {
            CustomerId = customerId;
            Email = email;
        }

        public CustomersInfo(int customerId, string email, string name)
        {
            CustomerId = customerId;
            Email = email;
            Name = name;
        }

        public int CustomerId { get; set; }
        public string Email { get; set; }
        public string Name { get; set; }
        public CustomersAggregate Customers { get; set; }
    }


}

In a real scenario, I have to read an .xlsx file with two columns: Name and Email. To insert in the customers table, I group by names, then select the first of each grouping, and then insert in the Customers table. Thus, in the Customers table, all names are distinct. After that, I read the .xlsx file again and compare the names that are already inserted in Customers with the values read in the name column of the file. If CustomersInfo.CustomerId are the same, it receives Customers.Id. Below I represent the described:

using Domain.Entities;
using System;
using System.Collections.Generic;

namespace Domain
{
    public static class Program
    {
        public static void Main()
        {
            IEnumerable<Customers> customers = new List<Customers>() {
                new Customers(1, "Gustavo"),
                new Customers(2, "Lilian"),
                new Customers(3, "Ruan"),
                new Customers(4, "Zeze"),
                new Customers(5, "Augusto"),
                new Customers(6, "Maicon"),
                new Customers(7, "Nadia")
            };

            IEnumerable<CustomersInfo> customersInfos = new List<CustomersInfo>()
            {
                new CustomersInfo(0, "gustavo@costa.com", "Gustavo"),
                new CustomersInfo(0, "gustavo@mcosta.com", "Gustavo"),
                new CustomersInfo(0, "lilian@costa.com", "Lilian"),
                new CustomersInfo(0, "lian@costa.com", "Lilian"),
                new CustomersInfo(0, "ruan@co3sta.com", "Ruan"),
                new CustomersInfo(0, "ruan@c2osta.com", "Ruan"),
                new CustomersInfo(0, "ruan@1costa.com", "Ruan"),
                new CustomersInfo(0, "zeze@costa.com", "Zeze"),
                new CustomersInfo(0, "zeze@sscosta.com", "Zeze"),
                new CustomersInfo(0, "austo@lentini.com", "Augusto"),
                new CustomersInfo(0, "austo@lib.com", "Augusto"),
                new CustomersInfo(0, "maicon@22coa.com", "Maicon"),
                new CustomersInfo(0, "nadia@22c.com", "Nadia"),
            };

            var ci = new List<CustomersInfo>();

            foreach (var customer in customers)
            {
                foreach (var customerInfo in customersInfos)
                {
                    if (customer.Name == customerInfo.Name)
                    {
                        ci.Add(new CustomersInfo(customer.Id, customerInfo.Email));
                    }
                }
            }

            ci.ForEach(x =>
            {
                Console.WriteLine(string.Concat(x.CustomerId, " - ", x.Email));
            });
        }
    }
}

The problem is that the .xlsx file has about 7000 records. After the grouping done to select the names of different customers, the Customers table has 3000 rows, which will then be compared with the 7000 records in the file, so in the end 3000 x 7000 comparisons will be necessary, which obviously makes the program very slow. I think that a way to improve performance would be to sort customers alphabetically, divide them into two lists with reference to the central item of customers and make comparisons only within those lists. Ideally, in the comparison below, customerInfo.Name would not be compared with customer.Name if the latter starts with a predecessor letter than the initial letter of customerInfo.

if (customer.Name == customerInfo.Name)

However, I have no idea how to implement this. Can anyone help?

If you need more control over how the comparison takes place, you can implement `IEquatable `. See https://stackoverflow.com/questions/1307493/is-there-a-complete-iequatable-implementation-reference — Robert Harvey, Jul 31 '20 at 15:12

Guru Stron · Accepted Answer · 2020-07-31T14:14:42.377

You can use LINQ's methods GroupBy and ToDictionary:

var dict = customersInfos
    .GroupBy(ci => ci.Name)
    .ToDictionary(g => g.Key);
foreach (var customer in customers)
{
    if(dict.ContainsKey(customer.Name))
    {
         ci.AddRange(dict[customer.Name].Select(cInf => new CustomersInfo(customer.Id, cInf.Email)));
    }
}

That should reduce number of operations to O(customer.Lenth + customersInfos.Length) from original O(customer.Lenth * customersInfos.Length)

score 1 · Answer 2 · answered Jul 31 '20 at 14:10

Try to use algorithmic approach by finding out bottlenecks. The bottleneck is that for each customer whole list of customerInfos scanned to ensure if there is email matched. It's takes O(n).

Operation finding email by customer.Name can be done using Dictionary<string,string> data structure. Searching by customer.Name take O(1)

var dict = customerInfos.ToDictionary(x => x.Email, x => x.Name);
foreach (var customer in customers)
    if (dict.ContainsKey(customer.Name))
        ci.Add(new CustomersInfo(customer.Id, dict[customer.Name]));

That will throw on OP's data. – Guru Stron Jul 31 '20 at 14:10 — Guru Stron, Jul 31 '20 at 14:10

Reduce the number of common list property comparison in .NET Core

2 Answers2