Comparing two lists for duplicates in either of them in C#

Question

I have two Lists like so:

List<EmpData> colExistingEmpData;
List<EmpData> colExternalEmpData;

Each of them will have employee records that have the same Id. I know this sounds wierd but that's a real situation I am in right now!

For every employee in colExternalEmpData based on EmpId a check is made on colExistingEmpData

foreach (EmpData employee in colExternalEmpData)
{
  var queryResult = colExistingEmpData.FindAll(thisEmployee => thisEmployee.Id == employee.Id);

  if(querResult.count == 0)
  {
    // Mark as INSERT
  } 
  else if(querResult.count == 1)
  {
    // Mark as UPDATE
  }
  else // queryResult is more than 1
  {
    // data is duplicated mark as IGNORE
  }

  analysedData.Add(employee);

This works fine when colExistingEmpData has no duplicated value for the 'Id'

When there are duplicates in colExternalEmpData, meaning if two employees have same 'ID' as 123 the above code will still mark the existing employee with 123 id as update because it finds an exact match in colExistingEmpData provided colExistingEmpData has just one reocrd with that Id.

Is there a way in which an employee record can be marked as 'IGNORE' when it's repeated in either of the sources?

I can't use a Dictionary object, I had used it before but the powers that be didn't like the idea.

Regards.

What do you mean with "mark as IGNORE" ? Why cant you have an HashSet with all the Ids that are duplicates? Also your query will run two times here. Use ToList to only run it once ;) — Evelie, Feb 27 '13 at 12:08
is this ur exact code? `FindAll` returns a list, not an integer, doesnt it? — YavgenyP, Feb 27 '13 at 12:12
*I had used it before but the powers that be didn't like the idea*. That's something new I must admit. Do your peers prefer apps to be slow and hanging also? Dictionary would yield fastest results in this case (at least asymptotically). — vgru, Feb 27 '13 at 12:16
@Groo you are very generous when using the words slow and hanging for my peers, they prefer in giving marching orders instead. — Codehelp, Feb 27 '13 at 12:20
@Groo, I have to say I understand where he's coming from. I don't know if you've ever worked in an extremely large organization, but there are things that just aren't allowed, regardless. Those things generally don't make sense, it's really more about somebody flaunting the position they have in the organization than anything else. They don't understand it - so they quash it - it makes them feel bigger and badder. Big organization politics - that's it. — Mike Perrenoud, Feb 27 '13 at 12:22
@Michael: I guess you're right, I've had the luck of working at places where good programming practices (well, common sense at least) are encouraged. If that's a big bad organization, then I would probably be compelled to create a big bad presentation comparing performance of a list lookup vs a dictionary lookup. Dogmatic rules such as this can only lower effectivenes and ultimately lower the product quality. — vgru, Feb 27 '13 at 12:31
@Groo, if I had a nickle for every time I've felt the same way and then realized you can't change things with rational - I'd be rich. I've worked in very small companies and I've worked in very large companies (from 25 to 25,000) and in the very large companies you must play the political game to get to the top (where you can make decisions like that) but then you must play politics to stay on top (which keep you from making decisions like that). Man, it's tough! LOL — Mike Perrenoud, Feb 27 '13 at 13:57

score 2 · Accepted Answer · answered Feb 27 '13 at 12:11

2

Consider just adding a processed list to the equation:

List<int> processed = new List<int>();

and then at the top of the loop add this code:

if (processed.Contains(employee.Id)) { continue; }
processed.Add(employee.Id);

and so you do that before you check the other list. It's the first thing you do because you don't really care if it's already been processed.

answered Feb 27 '13 at 12:11

Mike Perrenoud

66,820
29
157
232

This would be equivalent to just put continue; in his last else. I think he wants to mark the duplicates for some reason, not just skip them. – Evelie Feb 27 '13 at 12:13
@Evelie, the issue is that the employee shows up twice in the first list, the one being iterated. – Mike Perrenoud Feb 27 '13 at 12:15

YavgenyP · Answer 2 · 2013-02-27T12:23:28.607

Assuming I understand your problem correctly, You can always sort your ExternalList, by the employeeID, and then instead of using a Foreach loop, just use a while loop and skip employees while the id is the same.

This is +- how the code should look, reduced to integers:

List<int> external = new List<int>() { 1, 2, 2, 5, 1, 3 };
List<int> internalList = new List<int>() { 1, 4, 5, 3 };
external.Sort();
int index = 0;
int item = -1;
while (index < external.Count)
{
    if (external[index] != item)
    {
       item = external[index];
       internalList.FindAll(t => t == item);
    }
    index++;
 }

score 0 · Answer 3 · edited May 23 '17 at 11:57

0

I dont know if this would help, but why not merge the two lists, and remove duplicates :-

var mergedList = list1.Union(list2).ToList();

You should check out this Question :-

Create a list from two object lists with linq

edited May 23 '17 at 11:57

Community

1
1

answered Feb 27 '13 at 12:12

Derek

8,300
12
56
88

Essentially the OP needs to know, row by row, how to process it. Joining the two lists, though would get rid of the duplicates, wouldn't really address the need to know how to process each row based on its existence in the second list. – Mike Perrenoud Feb 27 '13 at 12:14
1

It's took me 5 attempts to understand that Question lol, i take your point, your correct. – Derek Feb 27 '13 at 12:19

score 0 · Answer 4 · answered Feb 27 '13 at 12:28

0

Another way would be overriding the Equals and GetHashCode and using .Distinct()

MSDN

answered Feb 27 '13 at 12:28

happygilmore

3,008
4
23
37

Comparing two lists for duplicates in either of them in C#

4 Answers4

Linked