104

I have a question on Union and Concat.

var a1 = (new[] { 1, 2 }).Union(new[] { 1, 2 });             // O/P : 1 2
var a2 = (new[] { 1, 2 }).Concat(new[] { 1, 2 });            // O/P : 1 2 1 2

var a3 = (new[] { "1", "2" }).Union(new[] { "1", "2" });     // O/P : "1" "2"
var a4 = (new[] { "1", "2" }).Concat(new[] { "1", "2" });    // O/P : "1" "2" "1" "2"

The above result are expected, but in the case of List<T> I am getting the same result from both Union and Concat.

class X
{
    public int ID { get; set; }
}

class X1 : X
{
    public int ID1 { get; set; }
}

class X2 : X
{
    public int ID2 { get; set; }
}

var lstX1 = new List<X1> { new X1 { ID = 10, ID1 = 10 }, new X1 { ID = 10, ID1 = 10 } };
var lstX2 = new List<X2> { new X2 { ID = 10, ID2 = 10 }, new X2 { ID = 10, ID2 = 10 } };
        
var a5 = lstX1.Cast<X>().Union(lstX2.Cast<X>());     // O/P : a5.Count() = 4
var a6 = lstX1.Cast<X>().Concat(lstX2.Cast<X>());    // O/P : a6.Count() = 4

But both are behaving the same incase of List<T>.

Any suggestions please?

Tim
  • 5,435
  • 7
  • 42
  • 62
Prasad Kanaparthi
  • 6,423
  • 4
  • 35
  • 62

3 Answers3

133

Union returns Distinct values. By default it will compare references of items. Your items have different references, thus they all are considered different. When you cast to base type X, reference is not changed.

If you will override Equals and GetHashCode (used to select distinct items), then items will not be compared by reference:

class X
{
    public int ID { get; set; }

    public override bool Equals(object obj)
    {
        X x = obj as X;
        if (x == null)
            return false;
        return x.ID == ID;
    }

    public override int GetHashCode()
    {
        return ID.GetHashCode();
    }
}

But all your items have different value of ID. So all items still considered different. If you will provide several items with same ID then you will see difference between Union and Concat:

var lstX1 = new List<X1> { new X1 { ID = 1, ID1 = 10 }, 
                           new X1 { ID = 10, ID1 = 100 } };
var lstX2 = new List<X2> { new X2 { ID = 1, ID2 = 20 }, // ID changed here
                           new X2 { ID = 20, ID2 = 200 } };

var a5 = lstX1.Cast<X>().Union(lstX2.Cast<X>());  // 3 distinct items
var a6 = lstX1.Cast<X>().Concat(lstX2.Cast<X>()); // 4

Your initial sample works, because integers are value types and they are compared by value.

Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • 4
    Even if it wasn't comparing references but e.g. the IDs within, there would still be four items as the IDs are different. – Rawling Nov 16 '12 at 13:32
  • @Swani nope, they are not. I think you didn't changed ID of first item in second collection, as I stated above – Sergey Berezovskiy Nov 16 '12 at 13:47
  • @Swani then you haven't override Equals and GetHashCode, as I stated above – Sergey Berezovskiy Nov 16 '12 at 14:09
  • @lazyberezovsky, I agree with your answer. But i am still not happy with the comments. If you execute my sample code then you can see the same result for 'a5' & 'a6'. I am not looking for solution. But why 'Concat' & 'Union' behaving same at that sistuation. Please reply. – Prasad Kanaparthi Nov 16 '12 at 17:29
  • 4
    @Swani sorry, was afk. `x.Union(y)` is the same as `x.Concat(y).Distinct()`. So difference is only with applying `Distinct`. How Linq selects distinct (i.e. different) objects in concatenated sequences? In your sample code (from question) Linq compares objects by reference (i.e. address in memory). When you create new object via `new` operator, it allocates memory at new address. So, when you have four new created objects, addresses will be different. And all objects will be distinct. Thus `Distinct` will return all objects from sequence. – Sergey Berezovskiy Nov 16 '12 at 19:45
  • @Swani so, how you can compare objects, which are allocated at different addresses in memory, but have same value? You can either override `GetHashCode` and `Equals` methods of object (these methods used by Linq for objects comparison), or provide `IComparer` to operation (see answer by Tim). So, you need **change your sample code** - override those two methods to compare objects by value, **and** make values of objects equal. – Sergey Berezovskiy Nov 16 '12 at 19:48
59

Concat literally returns the items from the first sequence followed by the items from the second sequence. If you use Concat on two 2-item sequences, you will always get a 4-item sequence.

Union is essentially Concat followed by Distinct.

In your first two cases, you end up with 2-item sequences because, between them, each pair of input squences has exactly two distinct items.

In your third case, you end up with a 4-item sequence because all four items in your two input sequences are distinct.

Rawling
  • 49,248
  • 7
  • 89
  • 127
19

Union and Concat behave the same since Union can not detect duplicates without a custom IEqualityComparer<X>. It's just looking if both are the same reference.

public class XComparer: IEqualityComparer<X>
{
    public bool Equals(X x1, X x2)
    {
        if (object.ReferenceEquals(x1, x2))
            return true;
        if (x1 == null || x2 == null)
            return false;
        return x1.ID.Equals(x2.ID);
    }

    public int GetHashCode(X x)
    {
        return x.ID.GetHashCode();
    }
}

Now you can use it in the overload of Union:

var comparer = new XComparer();
a5 = lstX1.Cast<X>().Union(lstX2.Cast<X>(), new XComparer()); 
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939