Given two lists of different objects, how to combine values from each into a new list based on a common Id, efficiently?

Question

What I am trying to do is combine two lists of different types into a new type, on the Id property of each. Both lists have different properties that I need in the new list.

This snippet is working already, but I am not happy with the performance. Assuming both lists are the same length, and in whatever order they need to be, is it possible to do this more efficiently?

class A
{
    public int Id { get; set; }
    public string stuffA { get; set; }
    //other properties that we aren't using
}
class B
{
    public int Id { get; set; }
    public string stuffB { get; set; }
    //other properties we aren't using
}
class C
{
    public int Id { get; set; }
    public string stuffA { get; set; }
    public string stuffB { get; set; }
}
public List<C> getList(List<A> listA, List<B> listB)
{
    var listC = new List<C>();
    
    foreach(var a in listA)
    {
        var b = listB.Where(x => x.Id == a.Id);
        listC.Add(new C{ Id = a.Id, stuffA = a.stuffA, stuffB = b.stuffB});
    }
    return listC;
}

I've looked into the Enumerable.Zip method, which pairs up two lists in the order provided, but I can't seem to use that with objects. The only examples I can make work are with primitive types.

I've seen this question: How merge two lists of different objects? but this doesn't create a new type, only an anonymous type containing the old lists (I believe).

Any ideas?

DavidG · Answer 1 · 2018-04-18T23:02:46.060

1

Your looking for a simple Linq join, the order of items doesn't matter. For example:

public List<C> getList(List<A> listA, List<B> listB)
{
    var listC = from a in listA
                join b in listB on a.Id equals b.Id
                select new C 
                {
                    Id = a.Id,
                    stuffA = a.stuffA,
                    stuffB = b.stuffB
                };

    return listC.ToList();
}

If you can ensure that both lists are in the same order with the same number of elements, you can loop through the lists and join based on index. This eliminates any searches required:

public List<C> getList(List<A> listA, List<B> listB)
{
    var listC = new List<C>();

    for(var i = 0; i < listA.Count; i++)
    {
        listC.Add(new C 
        {
            Id = listA[i].Id,
            stuffA = listA[i].stuffA,
            stuffB = listB[i].stuffB
        };
    }

    return listC;
}

edited Apr 18 '18 at 23:02

answered Apr 18 '18 at 22:36

DavidG

113,891
12
217
223

This saved me about a second (33 to 32), and it looks nicer than mine too. I'll run more tests, but I'm not convinced this is entirely more efficient, though. That second difference might be from elsewhere. – Stanfrancisco Apr 18 '18 at 22:55
Do you have control over the order of the list? And do both lists have the same number of elements and matching IDs? – DavidG Apr 18 '18 at 22:58
Yes to both questions @DavidG – Stanfrancisco Apr 18 '18 at 22:58
How about option 2? – DavidG Apr 18 '18 at 23:02
You wouldn't believe it - but that one takes longer for some reason. Up to 38. Does accessing an index perhaps take a little time then? – Stanfrancisco Apr 18 '18 at 23:11
I would guess there's something not right with your metrics. Are you allowing for warm up? Is it compiled in release instead of debug mode? What are you using to do the timing? How many elements do you have? – DavidG Apr 18 '18 at 23:12
I'm using Stopwatch - it starts right before I call the function, and stops after the function is executed. You're right though the timing is probably funky. But yes I am compiled in debug instead of release - I set a breakpoint to check the timer when it returns. – Stanfrancisco Apr 18 '18 at 23:15
Don't use debug mode, it's not going to give you consistent results. – DavidG Apr 18 '18 at 23:16
Every test I'm running is giving the best results on your first answer, but that could be for a number of reasons I assume. It's a good start, and consistently more efficient than the route I chose originally. – Stanfrancisco Apr 18 '18 at 23:22
The linq join method under the hood probably implements a hash table as @zdravko danev mentions in his answer below. If it did not it would implement nested loops like your original code and display similar performance. I would recommend explicitly trying the hashtable approach and compare that as well. – user7396598 Apr 18 '18 at 23:33
@user7396598 I'm doing that right now, to see the results. – Stanfrancisco Apr 18 '18 at 23:33

score 0 · Answer 2 · answered Apr 18 '18 at 22:37

0

This sounds a bit like a homework question but anyway... the trivial solution has a complexity of O(n^2). if efficiency is your goal, you need to use a structure that allows for fast searching like a hashtable, then your solution will be O(n). so add the elements of the first array to a hash table, then loop thru the second array and compose the result.

answered Apr 18 '18 at 22:37

Z .

12,657
1
31
56

I tried this and it seems less efficient than the method in the answer above, but maybe it becomes more efficient when I have more data in the set. I'll continue researching. – Stanfrancisco Apr 18 '18 at 23:54

Given two lists of different objects, how to combine values from each into a new list based on a common Id, efficiently?

2 Answers2