-2

I have two lists of the following values.

PirateShipList1

Date       | PirateShipName     | CrewMembers
---------------------------------------------
1/1/1800   | Cindy              | 5
1/2/1800   | TheIvan            | 20
1/3/1800   | TheTerrible        | 10

And PirateShipList2

Date       | PirateShipName     | CrewMembers
---------------------------------------------
1/1/1800   | Cindy              | 0
1/2/1800   | Cindy              | 0
1/3/1800   | Cindy              | 0
1/1/1800   | TheIvan            | 0
1/2/1800   | TheIvan            | 0
1/3/1800   | TheIvan            | 0
1/1/1800   | TheTerrible        | 0
1/2/1800   | TheTerrible        | 0
1/3/1800   | TheTerrible        | 0

I want to merge the two lists on Date and PirateShipName so that if It's present in List1, I take its CrewMembers value, else I take Lists 2 CrewMembers value.

Thus the final list would look like.

Date       | PirateShipName     | CrewMembers
---------------------------------------------
1/1/1800   | Cindy              | 5
1/2/1800   | Cindy              | 0
1/3/1800   | Cindy              | 0
1/1/1800   | TheIvan            | 0
1/2/1800   | TheIvan            | 20
1/3/1800   | TheIvan            | 0
1/1/1800   | TheTerrible        | 0
1/2/1800   | TheTerrible        | 0
1/3/1800   | TheTerrible        | 10

I've found I can do this hackily by the following

List<PirateLedgers> Final = PirateShipList2.Union(PirateShipList1)
.GroupBy(x => new { x.Date, x.PirateShipName })
.Select(x => new PirateLedgers
{    
Date = x.Key.Date,
PirateShipName = x.Key.PirateShipName,
CrewMembers = x.Sum(l => l.CrewMembers)                              
}).ToList<PirateLedgers>();

But I suspect that there is a smarter and better way of doing this with an actual join. Thank you in advance either way!

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Ivan S
  • 1,051
  • 6
  • 14
  • 26
  • https://codereview.stackexchange.com – L.B Oct 10 '17 at 17:01
  • Your query and the description of it are completely different. The description is "merge the lists, preferring the values from list 1 over list 2", and the query is "merge the lists and sum the values". They are only the same because of the coincidence that in your particular example, all the values in list 2 are zero. What would happen if the values in list 2 were not zero? – Eric Lippert Oct 10 '17 at 17:02
  • 1
    how about modifying a **clone of PirateShipList2** and then running a **foreach loop on PirateShipList1**.. and if any record is found for that particular Date and PirateShipName **replace CrewMembers**... I dont know how much better it would perfom as comared to your query.... just my thoughts :) – Rohit Kumar Oct 10 '17 at 17:03
  • @EricLippert That's why it's my hacky solution. It won't work if the values in list 2 are nonzero (in this case they are). Sorry for the confusion. – Ivan S Oct 10 '17 at 17:04
  • `CrewMembers = Math.Max(x.Key.CrewMembers, x[1].CrewMembers)` ? – M.kazem Akhgary Oct 10 '17 at 17:06
  • Write an *IEqualityComparer* and use a HashSet. Add list1 to this hashset and then list2. – L.B Oct 10 '17 at 17:06
  • There's a commonly-used LINQy extension overloading `Distinct` with a selector. (see https://stackoverflow.com/questions/489258/linqs-distinct-on-a-particular-property) Using this, you can `Concat` your sequences and then `Distinct(x => x.PirateShipName)`. The usual `IEnumerable` implementation of this `Distinct` overload biases towards the first occurrence of a duplicated item. – Oly Oct 10 '17 at 17:14
  • Are name-date pairs in each list guaranteed to be unique? – spender Oct 10 '17 at 17:16
  • 1
    TRY TO AVOID the common `.GroupBy(x => x.y).Select(y => y.First())` pattern for getting distinct values from a sequence. It has the overhead of creating _all_ groups instead of discarding repeated values. – Oly Oct 10 '17 at 17:18

3 Answers3

0

Give a try to

PirateShipList1.Union(PirateShipList2).GroupBy(x => new { x.Date, x.Name }, (key, value) => value.OrderByDescending(x => x.Number).First());
0

This query should do it:

var query = 
    from s2 in PirateShipList2
    join s1 in PirateShipList1 on s2.PirateShipName equals s1.PirateShipName
    select new
    {
        s2.Date,
        s2.PirateShipName,
        CrewMember = s1.Date == s2.Date ? s1.CrewMember : s2.CrewMember
    };

In method syntax:

PirateShipList2
    .Join(PirateShipList1, 
        x => x.PirateShipName, x=> x.PirateShipName,
        (s2, s1) => new
        {
            s2.Date,
            s2.PirateShipName,
            CrewMember = s1.Date == s2.Date ? s1.CrewMember : s2.CrewMember
        });

EDIT: As @spender pointed out, the above solution produces a cartesian join for each set of matching names. So, here's another solution:

PirateShipList2
    .Select(s2 => new 
    { 
        s2, 
        s1 = PirateShipList1.FirstOrDefault(s1 => 
            s1.PirateShipName == s2.PirateShipName &&
            s1.Date == s2.Date)
    })
    .Select(x => new
    {
        x.s2.Date,
        x.s2.PirateShipName,
        CrewMember = x.s1 != null ? x.s1.CrewMember : x.s2.CrewMember
    })
Xiaoy312
  • 14,292
  • 1
  • 32
  • 44
0

Consider adding this common Distinct overload

public static IEnumerable<TSource> Distinct<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> seen = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (seen.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

and then using it thus

var final = PirateShipList1.Concat(PirateShipList2).Distinct(x => x.PirateShipName);

Take care that this approach only works 'in-memory' (i.e. IEnumerable, not at the IQueryable level).

This approach has the benefit that it preserves order, and only does the necessary computation - it discards anything it doesn't need from List2. It's also very declarative and readable (which is a huge benefit of LINQ in general).

Oly
  • 2,329
  • 1
  • 18
  • 23