0

I have two list different types (because they represents two different types in two different databases):

public partial class PersonOne
{
    public int id { get; set; }
    public string name { get; set; }
    public string surname { get; set; }
    public string address { get; set; }
    public string phone { get; set; }
}

public partial class PersonTwo
{
    public int id { get; set; }
    public string firstname { get; set; }
    public string lastname { get; set; }
    public string email { get; set; }
}

I want to eliminate duplicates with the same firstname and lastname on the List and then take from that list objects which have got different firstname and lastname than objects on the List. Firstname in the class PartTwo = name in the class PartOne and lastname in the class PartTwo = surname in the class PartTwo.

I have that code but it is very slow:

List<PersonOne> personOneList = new List<PersonOne>(); // very big list 
List<PersonTwo> personTwoList = new List<PersonTwo>(); // very big list

List<PersonTwo> difference = personTwoList
    .GroupBy(x => new { FirstName = x.firstname.ToLower(), LastName = x.lastname.ToLower() })
    .Select(x => x.First())
    .Where(x => !personOneList.Any(y => y.name.Equals(x.firstname, StringComparison.InvariantCultureIgnoreCase) && y.surname.Equals(x.lastname, StringComparison.InvariantCultureIgnoreCase)))
    .ToList();
Alexander
  • 31
  • 5
  • 1
    Why do you have two distinct `Person` types? – Yuval Itzchakov Apr 26 '15 at 09:19
  • 3
    Well, you can try to sort those lists by name-surname pair and then do stream distinct. It will be `O(NOne*Log(NOne) + NTwo*Log(NTwo)`, while now you have `O(NTwo*NOne)`. Or you can try to use [`HashSet.Except`](https://msdn.microsoft.com/en-us/library/bb908036.aspx) that will probably work even better. – Eugene Podskal Apr 26 '15 at 09:22
  • I have two distinct Person types because I have to merge records from two databases. – Alexander Apr 26 '15 at 10:00
  • Isn't this a duplicate of http://stackoverflow.com/questions/29865276/how-can-i-take-objects-from-the-second-set-of-objects-which-dont-exist-in-the-f/29865770#29865770? If you need more details, ask again on the original question. – Iravanchi Apr 26 '15 at 10:16

2 Answers2

1

Try this:

        var HashTable = new Dictionary<Tuple<String,String>,Object>();

        foreach (PersonOne person in personOneList)
        {
            var personTuple = Tuple.Create(person.name, person.surname);
            if (!HashTable.ContainsKey(personTuple))
            {
                HashTable[personTuple] = person;
            }
        }
        foreach (PersonTwo person in personTwoList)
        {
            var personTuple = Tuple.Create(person.firstname, person.lastname);
            if (!HashTable.ContainsKey(personTuple)) {
                HashTable[personTuple] = person;
            }
        }


        var myResult = HashTable.Where(x => x.Value is PersonTwo).Select(x => x.Value).Cast<PersonTwo>().ToList();

The HashTable (Dictionary) simplifies the job of (a) excluding people of type PersonOne from the list, and (b) removing duplicates of person two.

Thirdly, it works in O(N) time, not O(N^2).

David E
  • 1,384
  • 9
  • 14
  • Actually, I've just seen this answer - which I'd recommend above mine, as it's a little shorter (and actually a little faster, as it doesn't require putting the PersonOne into memory). Thirdly, it's a little cleaner as to what the code is doing! :) - http://stackoverflow.com/a/29865393/3940783 – David E Apr 26 '15 at 11:34
  • (the Except works over tuples of first name, last name, so your concern there isn't an issue - so I definitely recommend theirs above mine! - and the general overview answer too.) I didn't realise your other thread existed when I posted. In the future, there's no need to post the same question twice - just edit the previous one. But glad this helps. – David E Apr 26 '15 at 11:39
0

First of all I would recommend you to use single class for the Person. If there are differences then you make parent class as Person and inherit PersonOne and PersonTwo from Person.

For your existing design I will recommend you to use IEnumerable instead of List. Have look

Stopwatch sw = new Stopwatch();
sw.Start();

List<PersonTwo> difference = personTwoList
.GroupBy(x => new { FirstName = x.firstname.ToLower(), LastName = x.lastname.ToLower() })
.Select(x => x.First())
.Where(x => !personOneList.Any(y => y.name.Equals(x.firstname, StringComparison.InvariantCultureIgnoreCase) && y.surname.Equals(x.lastname, StringComparison.InvariantCultureIgnoreCase)))    
.ToList();

        sw.Stop();

        Console.WriteLine("Time elapsed: {0}", sw.ElapsedTicks);//took 83333ms

        Stopwatch sw1 = new Stopwatch();
        sw1.Start();

        IEnumerable<PersonTwo> difference1 = personTwoList
            .GroupBy(x => new { FirstName = x.firstname.ToLower(), LastName = x.lastname.ToLower() })
            .Select(x => x.First())
            .Where(x => !personOneList.Any(y => y.name.Equals(x.firstname, StringComparison.InvariantCultureIgnoreCase) && y.surname.Equals(x.lastname, StringComparison.InvariantCultureIgnoreCase)));

        sw1.Stop();

        Console.WriteLine("Time elapsed: {0}", sw1.ElapsedTicks);//took 9ms
        Console.ReadLine();

Result was based on following generated test data

for (int i = 0; i < 500; i++)
        {
            personOneList.Add(new PersonOne
                {
                    surname = "a" + i,
                    name = "b"+ i
                });

            personTwoList.Add(new PersonTwo
            {
                lastname = "a" + i,
                firstname = "b" + i
            });
        }

        for (int i = 0; i < 100; i++)
        {
            personTwoList.Add(new PersonTwo
            {
                lastname = "c" + i,
                firstname = "d" + i
            });
        }

        for (int i = 0; i < 100; i++)
        {
            personTwoList.Add(new PersonTwo
            {
                lastname = "a" + i,
                firstname = "b" + i
            });
        }
Shark644
  • 21
  • 4
  • 1
    Isn't this because of deferred execution? The algorithmic complexity is still the same. – Martin Smith Apr 26 '15 at 09:53
  • Yes @MartinSmith you are right. But when you will use IEnumberable for process on each item then with foreach then it will overall improve the efficiency. – Shark644 Apr 26 '15 at 11:12