0

I am trying to combine the search results from two separate directories. One being LDAP the other being a database.

var ldapResults = new List<LdapResult>();
var databaseResults = new List<DatabaseResult>();

class LdapResult
{
    string sso { get; set; }
    string name { get; set; }
    string mail { get; set; }
}

class DatabaseResult
{
    string sso { get; set; }
    string name { get; set; }
    string legacyAccountName { get; set; }
    string phone { get; set; }
}

class SearchResult
{
    string sso { get; set; }
    string name { get; set; }
    string mail { get; set; }
    string legacyAccountName { get; set; }
    string phone { get; set; }
}

A user can exist in both or either and I want the results from both of them with a preference to the LDAP results if there are any.

LdapResult { "Bob", "Bob Bobinson", "bob@bob.com" }
LdapResult { "Jerry", "Jerry Seinfield" "jerry@yesman.com" }
DatabaseResult { "Bob", "BOB BOBIN", "L1234", "(123) 456-7890" }
DatabaseResult { "Mary", "POPPINS, MARY", "L8394", "(555) 555-5555" }

I'd like the result in that scenario be:

List<SearchResult> = [
    { "Bob", "Bob Bobinson", "bob@bob.com", "L1234", "(123) 456-7890" },
    { "Jerry", "Jerry Seinfield", "jerry@yesman.com", null, null },
    { "Mary", "POPPINS, MARY", null, "L8394", "(555) 555-5555" }
]

I have looked into concatination then removing duplicates, unions, and joins, but all have a priority over the first source list I'm comparing it to, usually associated with a unique identifier that has to be in the source list.

Unions get close, but if say a sso exists in list one it will just skip list 2. Joins will skip list 2 if the sso doesn't exist in list 1, etc.

UPDATE 1/23 6:49 PM EST: sso is a unique identifier in this scenario. It is equivalent to a single sign on identifier in my organization.

The problem comes in when new users don't exist in legacy, old users don't exist in modern, and everyone else has both.

  • sso is a unique identifier
  • LDAP name will be preferred (that is actually the only property that is preferred--legacy systems produce names like "BOB BOBBIN" which is clearly not preferred but is usable if it's all I have).
  • Both LdapResult and DatabaseResult contain additional properties not in each other.

I changed samAccountName in this example to sso as samAccountName was a poor choice for the example on my part.

mrUlrik
  • 150
  • 11
  • Well you could use nullable types, but I can't say if that's the *right* way to do it. – Kredns Jan 23 '18 at 23:25
  • 1
    Why would there only be 3 results? How can we know that *Bob Bobinson* and *BOB BOBIN* are the same person? Can you update your question to give the criteria on which to determine a duplicate. – pmcilreavy Jan 23 '18 at 23:26
  • I guess I don't understand the question clearly. Are you merging 2 lists into 1, or comparing 2 lists with another one? – Mark Benningfield Jan 23 '18 at 23:30
  • @MarkBenningfield I am merging two lists into a new one. Consolidating data from both systems. – mrUlrik Jan 23 '18 at 23:56
  • @mrUlrik: OK, perhaps I have my "dense" hat on today, but is there a reason you can't write a method on the `SearchResult` class to merge the data according to your priorities? – Mark Benningfield Jan 24 '18 at 00:12
  • @MarkBenningfield I believe you are actually seeing my dense hat. That is essentially my question. I could pull this out into it's own method, but I'm still in the same boat with the same question. How do I merge the data? I could use multiple loops, reiterating through both, twice, to make sure all records are in the new list but I feel like there must be a way that is more efficient than that. – mrUlrik Jan 24 '18 at 00:18
  • I would have the Get methods return a List of a consistent Type and then use the LINQ Union method on the 2 Lists. You could write a comparer to deal with your LDAP over DB preference. https://alicebobandmallory.com/articles/2012/10/18/merge-collections-without-duplicates-in-c – Andrew Harris Jan 24 '18 at 03:38

2 Answers2

1

Updated based on your latest updates The way I see it, I would probably go the route of writing custom code for this. You could write a method that loops through both lists and compares them. If you want to do this in a more "fancy" style, I think something like this would work for you (I only used the primary key and one additional field, but the method should hold for other fields).

Step#1: Join the two lists, and do a preferential select from each element (e.g.: choose Ldap if it exists, otherwise use databaseresult).

List<SearchResult> resultList = ldapResult.GroupJoin(dbResult,
        ld => ld.sso,
        db => db.sso,
        (ld, db) => new
        {
            Sso = ld.sso,
            Name = ld.name,
            Mail = ld.mail,
            DbResult = db.DefaultIfEmpty()
        })
        .SelectMany(z =>
            z.DbResult.Select(
                db => new SearchResult
                {
                    sso = z.Sso,
                    name = (z.Name != "" && z.Name != null) ? z.Name : db?.name,
                    mail = z.Mail,
                    legacyAccountName = db?.legacyAccountName,
                    phone = db?.phone
                })).ToList();

Step#2: Add the missing ones from the DatabaseResult on

resultList.AddRange(dbResult.
        Where(z => resultList.Exists(y => y.sso == z.sso) == false)
        .Select(z => new SearchResult
        {
            sso = z.sso,
            name = z.name,
            legacyAccountName = z.legacyAccountName,
            mail = "",
            phone = z.phone
        }));

Output Results (Using your provided class structure and test data):

        for (int x = 0; x < resultList.Count; x++)
        {
            Console.WriteLine(resultList[x].sso + " - Name: " + resultList[x].name);
        }

        Console.ReadKey();

Output:

Bob - Name: Bob Bobinson

Jerry - Name: Jerry Seinfield

Mary - Name: POPPINS, MARY

Option #2 If you don't like the above, you could look into reflection, but this could have performance impacts. The below link covers some basics.

merging two objects in C#

Update: I forgot to give you the left-outer join version in my first example, which you need to not lose data from your LDAP Result list (just using "Join" will perform an inner join and cut out "Jerry" in your example, since he is not in the database results). I also revamped my example to use your exact models.

Nakorr
  • 63
  • 6
  • I am reading into AutoMapper. In regard to your first question that still requires list 1 to have an sso. If it does not contain an sso that is in list 2, the one in list 2 will not be included--though I think you've caught that in your edits. – mrUlrik Jan 24 '18 at 00:20
  • @mrUlrik Hopefully I didn't remove any helpful information, but I've revamped my response based on what you updated your post to say. Does the new logic make sense/help? – Nakorr Jan 24 '18 at 00:29
  • That does make sense and helps tremendously. I think I'd gotten into the mind set of simplification and could not see clearly in front of me anymore. So thank you. :D – mrUlrik Jan 24 '18 at 15:23
  • @mrUlrik I'm glad I could help. Please make sure you review the latest version of my post (just edited it as you responded, not sure if you saw the latest). I realized last night after I went to bed that I gave you an inner join, instead of left outer, which is what you want. The new example should give you the exact results you want. – Nakorr Jan 24 '18 at 15:28
1

If you're going to merge 2 lists, you have to touch each item in each list, but you only have to do it once.

The required code is almost all simple boilerplate stuff, so I'm not going to write up a whole class hierarchy here, but the gist is:

Have your SearchResult class implement the IEquatable<SearchResult> interface, and test for equality on the sso property. That way, you can always check your notional finalResults<SearchResult> list to see if that unique identity is already present.

It doesn't really matter whether you iterate all the way through the ldap list first, and then do the database list, or the other way around. Just do one, then the other.

For every item in the first list, create a SearchResult instance from the data and add it to the final results.

Then, for every item in the other list, create a SearchResult instance from the data, and see if it is in the final results list. If it isn't, just add it and get the next one.

If it is in the list, then call the SearchResult.Merge(SearchResult) method with your newly-created instance (from either the ldap result list or the database result list).

That Merge() method that you'll write will check to make sure what data is present and what data is null and assign the properties according to your priorities. Since the instance is already in the final results list, you're done with that one. When you've gone through both the ldap list and the database list (one time), you're finished.

This is more or less pseudo-code, because it references code elements that you'll have to write:

foreach (LdapResult r in ldapResults)
{
    finalResults.Add(new SearchResult(r));
}

foreach (DatabaseResult r in databaseResults)
{
    SearchResult sr = new SearchResult(r);
    int i = finalResults.IndexOf(sr);
    if (i > -1)
    {
        finalResults[i].Merge(sr);
    }
    else
    {
        finalResults.Add(sr);
    }
}
Mark Benningfield
  • 2,800
  • 9
  • 31
  • 31